CIDv1.eth : onchain raw/cidv1 library

Hello again :folded_hands: from namesys.eth

This is our standalone libraries for onchain raw/cidv1 encoders.

Multicodecs:
dag-cbor, dag-json, dag-pb, json-utf8, raw-bin, unix-fs

Multihashes:
identity-raw, keccak-256, sha-256

test data & CAR files were generated using helia/js client.
Here’s working implementation of json-utf8 using ens wildcards to return onchain data as json/cidv1.

we’ve hit limits on url gateway for raw cidv1..
a) ipfs.io path gateway doesn’t resolve if length > around 165. dweb.io and local subdomain gateways try to force redirect even if cidv1 length > subdomain length limit. :person_shrugging:
b) eth.limo/DOH added concat support but Kubo can’t handle these long-raw CIDs.
c) ENS isn’t ready for IPLD contenthash support to bypass those limits.

Hi, bunch of things to unpack here, but need to learn more about your use case.

Do you mind sharing example of “long CIDs” that do not work for you?

CIDs for *-256 ones are 32 bytes in length, so how are you producing CIDs that are to long for a single DNS name?

Inlining a lot of data (>32 bytes) using identity one? That sounds like an antipattern but lmk if I misread the situation (CID examples would help a lot).

Two things here:

  • ipfs.io: mind providing example link? it indeed sounds like a bug, but also could be expected behavior due to explicit hash function safelist / CID data inlining length limits we have hardcoded as a security precaution.
  • dweb.link: Forced redirect to subdomain is a security feature. Subdomain Gateway Specification was created to ensure separate content roots are not sharing the same Origin. If it allowed for “long CIDs” to be loaded from paths, it would defeat its entire reason to exist.
  • localhost in Kubo (and IPFS Desktop) is a Subdomain gateway and behaves the same way as dweb.link. If you are just fetching JSON and dont need origin isolation, then use IP (127.0.0.1) – it will act as a Path Gateway similar to ipfs.io

Is there a specification for “concat” approach eth.limo did?

Is the DNS label length limit of 63 characters the only issues?


There is a gap in Subdomain Gateway Specification - we have an open problem of DNS label limit for hash functions longer than 256 bit ones.

There are some notes and Pros / Cons in:

So far we had no users who asked for addressing that, but if you share example CIDs/links, we could see if there is a path towards addressing the gap.

1 Like

RFC 4408 used to format longer SPF and TXT records.

Yes, we’re using both inlining and multihashed approach to support dynamic/onchain cidv1. Actual use case would provide both versions as ens contenthash and text records so standalone pinning service can read and pin that long raw data into multihashed ones, but it’s not suitable for more dynamic onchain data updated every block/use case.. Also, we use inline cidv1 to mix offchain pinned ipfs cids with onchain data using dag-cbor link tag.

Looks like that’s case for ipfs.io, it stops resolving after cidv1(base32) length > 463 with 285 utf-8 chars in raw data.. But same longer cids resolve with local 127.0.0.1:8080 path gateway running Kubo v0.35.0


we’ve base16 cids and CAR files in test dir.

RFC 4408 deals with concatenating multiple TXT records for longer SPF entries, but DNS label limits (63 characters per label, 253 total for FQDN) seem to be a separate constraint of the DNS system.

Could you clarify how ENS is using RFC 4408 techniques to address the DNS label length issue? Or did you mean TXT values and not DNS labels?

I was able to reproduce problem you describe.

Took the longest sample CID from dag-cbor-test-data.sol and prepended f to it to make it base16 CIDv1. Opening it locally works fine, but indeed, the URL https://ipfs.io/ipfs/f0171008206a663726177d82a5892000171008c01a36a7368613235362e747874d82a58250001551220a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e6c6964656e746974792e747874d82a50000155000b48656c6c6f20576f726c646d6b656363616b3235362e747874d82a58250001551b20592fa743889fc7f92ac2a37bb1f5ba1daf2a5c84741ca0e0061d243a2e6707ba646a736f6ed82a589e000171009801a36b7368613235362e6a736f6ed82a582600018004122065a4c4ca5acfc4bf3b34e138c808730f6c9ce141a4c6b889a9345e66c9739d736d6964656e746974792e6a736f6ed82a570001800400117b2248656c6c6f223a22576f726c64227d6e6b656363616b3235362e6a736f6ed82a5826000180041b20068cd05557eb0a2358c3fab33bcecc929f0b4f01caed8562f772bbeb8869eec9666e6573746564d82a59014800017100c202a263726177d82a5892000171008c01a36a7368613235362e747874d82a58250001551220a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e6c6964656e746974792e747874d82a50000155000b48656c6c6f20576f726c646d6b656363616b3235362e747874d82a58250001551b20592fa743889fc7f92ac2a37bb1f5ba1daf2a5c84741ca0e0061d243a2e6707ba646a736f6ed82a589e000171009801a36b7368613235362e6a736f6ed82a582600018004122065a4c4ca5acfc4bf3b34e138c808730f6c9ce141a4c6b889a9345e66c9739d736d6964656e746974792e6a736f6ed82a570001800400117b2248656c6c6f223a22576f726c64227d6e6b656363616b3235362e6a736f6ed82a5826000180041b20068cd05557eb0a2358c3fab33bcecc929f0b4f01caed8562f772bbeb8869eec96a696e6465782e68746d6cd82a581900015500143c68313e48656c6c6f20576f726c643c2f68313e6a726561646d652e747874d82a50000155000b48656c6c6f20576f726c646b636f6e6669672e6a736f6ed82a570001800400117b2248656c6c6f223a22576f726c64227d produces HTTP 502.

I’ve confirmed the Error 502 is produced by NGINX and not IPFS backend (Rainbow), but I’m hesitant investigating further, as the use case looks like a misuse of identity multihashes which should be only used for small data leaves (see questions below).

What is a.. “dynamic” CID? I’m honestly puzzled by the mention of “dynamic cidv1” - CIDs by definition represent immutable content.

Are you perhaps using CIDs as containers for the data itself (via identity hash) rather than as true cryptographic hash functions and real verifiable content identifiers? This feels like an inefficient way of abusing the system.

The example CID above is a CBOR document inlined inside of a CID with identity multihash (cid inspector):

$ ipfs dag get f0171008206a663726177d82a5892000171008c01a36a7368613235362e747874d82a58250001551220a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e6c6964656e746974792e747874d82a50000155000b48656c6c6f20576f726c646d6b656363616b3235362e747874d82a58250001551b20592fa743889fc7f92ac2a37bb1f5ba1daf2a5c84741ca0e0061d243a2e6707ba646a736f6ed82a589e000171009801a36b7368613235362e6a736f6ed82a582600018004122065a4c4ca5acfc4bf3b34e138c808730f6c9ce141a4c6b889a9345e66c9739d736d6964656e746974792e6a736f6ed82a570001800400117b2248656c6c6f223a22576f726c64227d6e6b656363616b3235362e6a736f6ed82a5826000180041b20068cd05557eb0a2358c3fab33bcecc929f0b4f01caed8562f772bbeb8869eec9666e6573746564d82a59014800017100c202a263726177d82a5892000171008c01a36a7368613235362e747874d82a58250001551220a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e6c6964656e746974792e747874d82a50000155000b48656c6c6f20576f726c646d6b656363616b3235362e747874d82a58250001551b20592fa743889fc7f92ac2a37bb1f5ba1daf2a5c84741ca0e0061d243a2e6707ba646a736f6ed82a589e000171009801a36b7368613235362e6a736f6ed82a582600018004122065a4c4ca5acfc4bf3b34e138c808730f6c9ce141a4c6b889a9345e66c9739d736d6964656e746974792e6a736f6ed82a570001800400117b2248656c6c6f223a22576f726c64227d6e6b656363616b3235362e6a736f6ed82a5826000180041b20068cd05557eb0a2358c3fab33bcecc929f0b4f01caed8562f772bbeb8869eec96a696e6465782e68746d6cd82a581900015500143c68313e48656c6c6f20576f726c643c2f68313e6a726561646d652e747874d82a50000155000b48656c6c6f20576f726c646b636f6e6669672e6a736f6ed82a570001800400117b2248656c6c6f223a22576f726c64227d | jq
{
  "config.json": {
    "/": "bagaaiaarpmreqzlmnrxseorck5xxe3deej6q"
  },
  "index.html": {
    "/": "bafkqafb4nayt4sdfnrwg6icxn5zgyzb4f5udcpq"
  },
  "json": {
    "/": "bafyqbgabunvxg2dbgi2tmltkonxw5wbklataaamaaqjcazneytffvt6ex45tjyjyzaehgd3mttqudjggxce2snc6m3exhhltnvuwizlooruxi6jonjzw63wyfjlqaamaaqabc6zcjbswy3dpei5cev3pojwgiit5nzvwky3dmfvtenjwfzvhg33o3avfqjqaagaaigzaa2gnavkx5mfcgwgd7kztxtwmskpqwtybzlwykyxxok56xcdj53eq"
  },
  "nested": {
    "/": "bafyqbqqcujrxeylx3avfreqaafyqbdabunvhg2dbgi2tmltupb2nqksyeuaacvisecszdjwubp2caqckaeltht5xwginmldfx4f43izlk6zhpwnnt4kg43djmrsw45djor4s45dyotmcuuaaafkqac2imvwgy3zak5xxe3denvvwky3dmfvtenjwfz2hq5gyfjmckaabkunsawjpu5byrh6h7evmfi33wh23uhnpfjoii5a4udqamhjehixgob52mrvhg33o3avfrhqaafyqbgabunvxg2dbgi2tmltkonxw5wbklataaamaaqjcazneytffvt6ex45tjyjyzaehgd3mttqudjggxce2snc6m3exhhltnvuwizlooruxi6jonjzw63wyfjlqaamaaqabc6zcjbswy3dpei5cev3pojwgiit5nzvwky3dmfvtenjwfzvhg33o3avfqjqaagaaigzaa2gnavkx5mfcgwgd7kztxtwmskpqwtybzlwykyxxok56xcdj53eq"
  },
  "raw": {
    "/": "bafyqbdabunvhg2dbgi2tmltupb2nqksyeuaacvisecszdjwubp2caqckaeltht5xwginmldfx4f43izlk6zhpwnnt4kg43djmrsw45djor4s45dyotmcuuaaafkqac2imvwgy3zak5xxe3denvvwky3dmfvtenjwfz2hq5gyfjmckaabkunsawjpu5byrh6h7evmfi33wh23uhnpfjoii5a4udqamhjehixgob52"
  },
  "readme.txt": {
    "/": "bafkqac2imvwgy3zak5xxe3de"
  }
}

I feel I am missing something.

Mind elaborating why you do this inlining for high level directory listings, and not using real IPFS blocks and short CIDs that work everywhere and do not require wasting bandwidth sending entire data back and forth (once to server, and once back)?

:folded_hands: err, sorry for bit late reply, I thought I already posted this, but it was still in draft.. so addin bit more context/origin story.

Sorry for mixup, under ENS context that’s dynamic contenthash.
eg, vitalik.jsonapi.eth
resolves to > https://ipfs.io/ipfs/bagaaiaf2af5se33lei5hi4tvmuwce5djnvsseorcge3tkmzxg4ztsnbxeiwceytmn5rwwir2eizdgmbsgmzdsmjcfqrgk4tdei5dalbcovzwk4rchj5seylemrzgk43tei5cemdymq4giqjwijddenrzgy2gcrrziq3wkrlehfstam2fguztimjviqztoykbhe3danbveiwce3tbnvsseorcozuxiylmnfvs4zlunarcyitcmfwgc3tdmurduirufy3tqmbwgu4celbcobzgsy3fei5cemzygm3c4mjxgm2dgobcpv6q

This ENS contenthash CIDv1 is “auto” update after each block/timestamp & vitalik.eth’s ETH balance/ price feed.

{
  "ok": true,
  "time": "1753773947",
  "block": "23023291",
  "erc": 0,
  "user": {
    "address": "0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045",
    "name": "vitalik.eth",
    "balance": "4.780658",
    "price": "3836.173438"
  }
}

As I’ve prev mentioned,
our tools support both multihash (sha256/keccak256) and raw/identity. 0x00 identity codec is used for onchain cid generation but underlying onchain data/payload isn’t static so that CID isn’t pinned anywhere offchain.

I’ve thought about extra bandwidth too, as we don’t have that data pinned offchain it’s not going to work without inlined data.. As backup plan we could use local Helia client or minimalist inline CID reader for this case. Another plan we’re looking into is to use TEXT records in ENS to return raw/inlined data for special pinning services, and then return multihashed version for contenthash.

bit recap,
all this started with our need for onchain contenthash generators, so we went back to adding plain datauri stringTohex("data:") = 0x646174613a prefix support in ENS contenthash but ENS devs want strict multicodec/multihash format.. Adding that simple non-colliding prefix support on top of ENSIP-07 contenthash specs would solve this issue (but limited to single file per contenthash).. Bit crazy, ENS is already using plaintext datauris in avatar’s text records.. :person_shrugging:

Thanks for the context. I understand why you chose this approach. However, relying on echoing back and forth unlimited data inlining is risky from the long term perspective.

Current situation: Your code works today because there’s no size limit on inlined data yet and gateways echo it back. But this is just luck – implementations will likely add limits soon, which would break this approach.

This is not a solid foundation to build things on:

  • Bitswap already penalizes this behavior: if you request a CID with identity hash from another node, it assumes you’re broken/misconfigured and closes the connection.
  • This protects the p2p network from bandwidth waste (you already have data in CID, why are you sending me request for data you already have?)
  • Other parts of the stack (gateways, new implementations) will likely add similar checks
  • Other people writing IPFS implementaiton from scratch will likely pick some “sensible” limit anyway.

Proposed limit:

Recommendation: do some future-proofing:

  • Don’t send large identity-hash CIDs to IPFS nodes or gateways
  • Instead, detect identity multicodec CIDs and deserialize the data locally
  • Consider using helia or helia-verified-fetch to handle identity multihashes properly, or just turn identity CIDs into data: URIs

This way you avoid the round-trip overhead and potential future breakage.