CIDv1.eth : onchain raw/cidv1 library

Hello again :folded_hands: from namesys.eth

This is our standalone libraries for onchain raw/cidv1 encoders.

Multicodecs:
dag-cbor, dag-json, dag-pb, json-utf8, raw-bin, unix-fs

Multihashes:
identity-raw, keccak-256, sha-256

test data & CAR files were generated using helia/js client.
Here’s working implementation of json-utf8 using ens wildcards to return onchain data as json/cidv1.

we’ve hit limits on url gateway for raw cidv1..
a) ipfs.io path gateway doesn’t resolve if length > around 165. dweb.io and local subdomain gateways try to force redirect even if cidv1 length > subdomain length limit. :person_shrugging:
b) eth.limo/DOH added concat support but Kubo can’t handle these long-raw CIDs.
c) ENS isn’t ready for IPLD contenthash support to bypass those limits.

Hi, bunch of things to unpack here, but need to learn more about your use case.

Do you mind sharing example of “long CIDs” that do not work for you?

CIDs for *-256 ones are 32 bytes in length, so how are you producing CIDs that are to long for a single DNS name?

Inlining a lot of data (>32 bytes) using identity one? That sounds like an antipattern but lmk if I misread the situation (CID examples would help a lot).

Two things here:

  • ipfs.io: mind providing example link? it indeed sounds like a bug, but also could be expected behavior due to explicit hash function safelist / CID data inlining length limits we have hardcoded as a security precaution.
  • dweb.link: Forced redirect to subdomain is a security feature. Subdomain Gateway Specification was created to ensure separate content roots are not sharing the same Origin. If it allowed for “long CIDs” to be loaded from paths, it would defeat its entire reason to exist.
  • localhost in Kubo (and IPFS Desktop) is a Subdomain gateway and behaves the same way as dweb.link. If you are just fetching JSON and dont need origin isolation, then use IP (127.0.0.1) – it will act as a Path Gateway similar to ipfs.io

Is there a specification for “concat” approach eth.limo did?

Is the DNS label length limit of 63 characters the only issues?


There is a gap in Subdomain Gateway Specification - we have an open problem of DNS label limit for hash functions longer than 256 bit ones.

There are some notes and Pros / Cons in:

So far we had no users who asked for addressing that, but if you share example CIDs/links, we could see if there is a path towards addressing the gap.

1 Like

RFC 4408 used to format longer SPF and TXT records.

Yes, we’re using both inlining and multihashed approach to support dynamic/onchain cidv1. Actual use case would provide both versions as ens contenthash and text records so standalone pinning service can read and pin that long raw data into multihashed ones, but it’s not suitable for more dynamic onchain data updated every block/use case.. Also, we use inline cidv1 to mix offchain pinned ipfs cids with onchain data using dag-cbor link tag.

Looks like that’s case for ipfs.io, it stops resolving after cidv1(base32) length > 463 with 285 utf-8 chars in raw data.. But same longer cids resolve with local 127.0.0.1:8080 path gateway running Kubo v0.35.0


we’ve base16 cids and CAR files in test dir.

RFC 4408 deals with concatenating multiple TXT records for longer SPF entries, but DNS label limits (63 characters per label, 253 total for FQDN) seem to be a separate constraint of the DNS system.

Could you clarify how ENS is using RFC 4408 techniques to address the DNS label length issue? Or did you mean TXT values and not DNS labels?

I was able to reproduce problem you describe.

Took the longest sample CID from dag-cbor-test-data.sol and prepended f to it to make it base16 CIDv1. Opening it locally works fine, but indeed, the URL https://ipfs.io/ipfs/f0171008206a663726177d82a5892000171008c01a36a7368613235362e747874d82a58250001551220a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e6c6964656e746974792e747874d82a50000155000b48656c6c6f20576f726c646d6b656363616b3235362e747874d82a58250001551b20592fa743889fc7f92ac2a37bb1f5ba1daf2a5c84741ca0e0061d243a2e6707ba646a736f6ed82a589e000171009801a36b7368613235362e6a736f6ed82a582600018004122065a4c4ca5acfc4bf3b34e138c808730f6c9ce141a4c6b889a9345e66c9739d736d6964656e746974792e6a736f6ed82a570001800400117b2248656c6c6f223a22576f726c64227d6e6b656363616b3235362e6a736f6ed82a5826000180041b20068cd05557eb0a2358c3fab33bcecc929f0b4f01caed8562f772bbeb8869eec9666e6573746564d82a59014800017100c202a263726177d82a5892000171008c01a36a7368613235362e747874d82a58250001551220a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e6c6964656e746974792e747874d82a50000155000b48656c6c6f20576f726c646d6b656363616b3235362e747874d82a58250001551b20592fa743889fc7f92ac2a37bb1f5ba1daf2a5c84741ca0e0061d243a2e6707ba646a736f6ed82a589e000171009801a36b7368613235362e6a736f6ed82a582600018004122065a4c4ca5acfc4bf3b34e138c808730f6c9ce141a4c6b889a9345e66c9739d736d6964656e746974792e6a736f6ed82a570001800400117b2248656c6c6f223a22576f726c64227d6e6b656363616b3235362e6a736f6ed82a5826000180041b20068cd05557eb0a2358c3fab33bcecc929f0b4f01caed8562f772bbeb8869eec96a696e6465782e68746d6cd82a581900015500143c68313e48656c6c6f20576f726c643c2f68313e6a726561646d652e747874d82a50000155000b48656c6c6f20576f726c646b636f6e6669672e6a736f6ed82a570001800400117b2248656c6c6f223a22576f726c64227d produces HTTP 502.

I’ve confirmed the Error 502 is produced by NGINX and not IPFS backend (Rainbow), but I’m hesitant investigating further, as the use case looks like a misuse of identity multihashes which should be only used for small data leaves (see questions below).

What is a.. “dynamic” CID? I’m honestly puzzled by the mention of “dynamic cidv1” - CIDs by definition represent immutable content.

Are you perhaps using CIDs as containers for the data itself (via identity hash) rather than as true cryptographic hash functions and real verifiable content identifiers? This feels like an inefficient way of abusing the system.

The example CID above is a CBOR document inlined inside of a CID with identity multihash (cid inspector):

$ ipfs dag get f0171008206a663726177d82a5892000171008c01a36a7368613235362e747874d82a58250001551220a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e6c6964656e746974792e747874d82a50000155000b48656c6c6f20576f726c646d6b656363616b3235362e747874d82a58250001551b20592fa743889fc7f92ac2a37bb1f5ba1daf2a5c84741ca0e0061d243a2e6707ba646a736f6ed82a589e000171009801a36b7368613235362e6a736f6ed82a582600018004122065a4c4ca5acfc4bf3b34e138c808730f6c9ce141a4c6b889a9345e66c9739d736d6964656e746974792e6a736f6ed82a570001800400117b2248656c6c6f223a22576f726c64227d6e6b656363616b3235362e6a736f6ed82a5826000180041b20068cd05557eb0a2358c3fab33bcecc929f0b4f01caed8562f772bbeb8869eec9666e6573746564d82a59014800017100c202a263726177d82a5892000171008c01a36a7368613235362e747874d82a58250001551220a591a6d40bf420404a011733cfb7b190d62c65bf0bcda32b57b277d9ad9f146e6c6964656e746974792e747874d82a50000155000b48656c6c6f20576f726c646d6b656363616b3235362e747874d82a58250001551b20592fa743889fc7f92ac2a37bb1f5ba1daf2a5c84741ca0e0061d243a2e6707ba646a736f6ed82a589e000171009801a36b7368613235362e6a736f6ed82a582600018004122065a4c4ca5acfc4bf3b34e138c808730f6c9ce141a4c6b889a9345e66c9739d736d6964656e746974792e6a736f6ed82a570001800400117b2248656c6c6f223a22576f726c64227d6e6b656363616b3235362e6a736f6ed82a5826000180041b20068cd05557eb0a2358c3fab33bcecc929f0b4f01caed8562f772bbeb8869eec96a696e6465782e68746d6cd82a581900015500143c68313e48656c6c6f20576f726c643c2f68313e6a726561646d652e747874d82a50000155000b48656c6c6f20576f726c646b636f6e6669672e6a736f6ed82a570001800400117b2248656c6c6f223a22576f726c64227d | jq
{
  "config.json": {
    "/": "bagaaiaarpmreqzlmnrxseorck5xxe3deej6q"
  },
  "index.html": {
    "/": "bafkqafb4nayt4sdfnrwg6icxn5zgyzb4f5udcpq"
  },
  "json": {
    "/": "bafyqbgabunvxg2dbgi2tmltkonxw5wbklataaamaaqjcazneytffvt6ex45tjyjyzaehgd3mttqudjggxce2snc6m3exhhltnvuwizlooruxi6jonjzw63wyfjlqaamaaqabc6zcjbswy3dpei5cev3pojwgiit5nzvwky3dmfvtenjwfzvhg33o3avfqjqaagaaigzaa2gnavkx5mfcgwgd7kztxtwmskpqwtybzlwykyxxok56xcdj53eq"
  },
  "nested": {
    "/": "bafyqbqqcujrxeylx3avfreqaafyqbdabunvhg2dbgi2tmltupb2nqksyeuaacvisecszdjwubp2caqckaeltht5xwginmldfx4f43izlk6zhpwnnt4kg43djmrsw45djor4s45dyotmcuuaaafkqac2imvwgy3zak5xxe3denvvwky3dmfvtenjwfz2hq5gyfjmckaabkunsawjpu5byrh6h7evmfi33wh23uhnpfjoii5a4udqamhjehixgob52mrvhg33o3avfrhqaafyqbgabunvxg2dbgi2tmltkonxw5wbklataaamaaqjcazneytffvt6ex45tjyjyzaehgd3mttqudjggxce2snc6m3exhhltnvuwizlooruxi6jonjzw63wyfjlqaamaaqabc6zcjbswy3dpei5cev3pojwgiit5nzvwky3dmfvtenjwfzvhg33o3avfqjqaagaaigzaa2gnavkx5mfcgwgd7kztxtwmskpqwtybzlwykyxxok56xcdj53eq"
  },
  "raw": {
    "/": "bafyqbdabunvhg2dbgi2tmltupb2nqksyeuaacvisecszdjwubp2caqckaeltht5xwginmldfx4f43izlk6zhpwnnt4kg43djmrsw45djor4s45dyotmcuuaaafkqac2imvwgy3zak5xxe3denvvwky3dmfvtenjwfz2hq5gyfjmckaabkunsawjpu5byrh6h7evmfi33wh23uhnpfjoii5a4udqamhjehixgob52"
  },
  "readme.txt": {
    "/": "bafkqac2imvwgy3zak5xxe3de"
  }
}

I feel I am missing something.

Mind elaborating why you do this inlining for high level directory listings, and not using real IPFS blocks and short CIDs that work everywhere and do not require wasting bandwidth sending entire data back and forth (once to server, and once back)?

:folded_hands: err, sorry for bit late reply, I thought I already posted this, but it was still in draft.. so addin bit more context/origin story.

Sorry for mixup, under ENS context that’s dynamic contenthash.
eg, vitalik.jsonapi.eth
resolves to > https://ipfs.io/ipfs/bagaaiaf2af5se33lei5hi4tvmuwce5djnvsseorcge3tkmzxg4ztsnbxeiwceytmn5rwwir2eizdgmbsgmzdsmjcfqrgk4tdei5dalbcovzwk4rchj5seylemrzgk43tei5cemdymq4giqjwijddenrzgy2gcrrziq3wkrlehfstam2fguztimjviqztoykbhe3danbveiwce3tbnvsseorcozuxiylmnfvs4zlunarcyitcmfwgc3tdmurduirufy3tqmbwgu4celbcobzgsy3fei5cemzygm3c4mjxgm2dgobcpv6q

This ENS contenthash CIDv1 is “auto” update after each block/timestamp & vitalik.eth’s ETH balance/ price feed.

{
  "ok": true,
  "time": "1753773947",
  "block": "23023291",
  "erc": 0,
  "user": {
    "address": "0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045",
    "name": "vitalik.eth",
    "balance": "4.780658",
    "price": "3836.173438"
  }
}

As I’ve prev mentioned,
our tools support both multihash (sha256/keccak256) and raw/identity. 0x00 identity codec is used for onchain cid generation but underlying onchain data/payload isn’t static so that CID isn’t pinned anywhere offchain.

I’ve thought about extra bandwidth too, as we don’t have that data pinned offchain it’s not going to work without inlined data.. As backup plan we could use local Helia client or minimalist inline CID reader for this case. Another plan we’re looking into is to use TEXT records in ENS to return raw/inlined data for special pinning services, and then return multihashed version for contenthash.

bit recap,
all this started with our need for onchain contenthash generators, so we went back to adding plain datauri stringTohex("data:") = 0x646174613a prefix support in ENS contenthash but ENS devs want strict multicodec/multihash format.. Adding that simple non-colliding prefix support on top of ENSIP-07 contenthash specs would solve this issue (but limited to single file per contenthash).. Bit crazy, ENS is already using plaintext datauris in avatar’s text records.. :person_shrugging:

Thanks for the context. I understand why you chose this approach. However, relying on echoing back and forth unlimited data inlining is risky from the long term perspective.

Current situation: Your code works today because there’s no size limit on inlined data yet and gateways echo it back. But this is just luck – implementations will likely add limits soon, which would break this approach.

This is not a solid foundation to build things on:

  • Bitswap already penalizes this behavior: if you request a CID with identity hash from another node, it assumes you’re broken/misconfigured and closes the connection.
  • This protects the p2p network from bandwidth waste (you already have data in CID, why are you sending me request for data you already have?)
  • Other parts of the stack (gateways, new implementations) will likely add similar checks
  • Other people writing IPFS implementaiton from scratch will likely pick some “sensible” limit anyway.

Proposed limit:

Recommendation: do some future-proofing:

  • Don’t send large identity-hash CIDs to IPFS nodes or gateways
  • Instead, detect identity multicodec CIDs and deserialize the data locally
  • Consider using helia or helia-verified-fetch to handle identity multihashes properly, or just turn identity CIDs into data: URIs

This way you avoid the round-trip overhead and potential future breakage.

1 Like

:folded_hands: hello again..
it’d be really^3 nice to have 256 bytes or even 512 bytes at max inline support for future..

Hi @0xc0de4c0ffee, FYSA the currently adopted (Boxo GO/Helia JS) limit is 128 bytes:

If you believe there is a sensible use case for increasing it for the entire IPFS ecosystem, feel free to open IPIP (spec change proposal) similar to IPIP-512: Limit Identity CID Size to 128 Bytes in UnixFS Contexts by lidel · Pull Request #512 · ipfs/specs · GitHub with real world “User benefit” section and it can be discussed there.

Note that it typically takes 6-12 months for a significant portion of the IPFS Swarm (nodes using the Amino DHT protocol) to adopt updates. If this deployment timeframe doesn’t align with your needs, you may want to consider alternative approaches (e.g. running own service worker gateway – see below).


P.S. I noticed this statement in the linked newsletter:

A community experiment showed how an ENS name can serve a fully inlined IPFS page using CIDv1.Raw—demonstrating lightweight, “no-gateway” content delivery.

This characterization seems misleading. The example actually relies entirely on a gateway to deserialize data and return the content to the client—it’s not truly “no-gateway” delivery.

For a more technically accurate approach to gateway-free content delivery, consider exploring solutions like:

These tools enable content retrieval and deserialization directly in the user’s browser, eliminating dependence on third-party HTTP servers.

If you run your own service worker gateway, you could theoretically increase the maximum inline limit, provided you understand and accept the associated risks.

1 Like

While current ENS services like eth.limo use ipfs node/gateways to retrieve those inlined data, it’s possible to read those raw data over ENS/client side and still be backwards compatible with ipfs gateways. Still main point is that we don’t have to pin those tiny “dynamic“ raw/on-chian data in our ipfs nodes/pinning services to make it work.

basic use cases :
a) <user>.domain.eth ~> raw.HTML(<meta http-equiv="refresh" content="0;url=https://domain.com/<user>"/>)
b) <ens/0xaddr>.nftclub.eth ~> raw.JSON( {"balance":5,"points":123456})

I’ll look around and send a PR when I’m free ~next week.. thanks :folded_hands::vulcan_salute:

Interesting, I think it’s worth distinguishing between use cases here.

Inlining small JSON data (like the <ens/0xaddr>.nftclub.eth example returning account balances) seems reasonable - the data itself is onchain and verifiable, just encoded efficiently in a CID when small. Fair enough, as long your software knows how to switch to regular IPFS retrieval beyond the inline limit.

On mixing ENS and unverifiable DNS

However, the technique of inlining HTML redirects for <user>.domain.eth to bounce users to traditional HTTPS URLs concerns me:

<meta http-equiv="refresh" content="0;url=https://domain.com/user"/>

What’s the use case for this? Vanity URLs? Bridging to regular websites?

In my mind this pattern undermines the core value proposition of ENS: cryptographic verifiability. You pay gas for blockchain resolution only to immediately redirect to an unverifiable ICANN domain served over DNS.

The trust chain breaks at that redirect.

FYSA we’ve ran into this exact issue back in 2021 and explicitly stopped supporting DNSLink-to-DNS patterns in ENS resolution (support DNSLink for encoding/decoding IPNS contenthash values · Issue #849 · ensdomains/ens-app · GitHub): there is no point in resolving ENS using blockchain, if the resolved value then points at /ipns/example.com (DNS name), because that needs to be resolved over regular DNS.

The same logic applies here - wrapping the DNS redirect in inline HTML doesn’t change the fundamental problem.

Feels like missing feature of ENS?

If there’s genuine demand for pointing ENS names at traditional websites, I’d suggest ENS consider making this explicit rather than having it hidden inside inline HTML. They already have ipfs-ns for
immutable content and ipns-ns for mutable IPNS names.

Perhaps an explicit dns-ns namespace could clearly signal “this is an insecure pointer to an ICANN name.”? Food for thought.

Explicit feels safer than implicit in this case: gateways and browsers could warn users appropriately, rather than silently executing redirects that look like verified content but aren’t (smells like a security theater without real protection, could also train people into false sense of security likea TLS padlock does on a phishing site).

I added that as an example for simple raw html/text.. I’d use _redirect specs if real redirect is required.. simple use case is for tiny onchain generated HTMX/HTML snippets and json data. Real use case for longer inlined raw/identity CID’s is for linking multiple cids (pinned offchain) as single directory using raw dag-pb/dag-cbor.

eg,

Cidv1.raw.dagCborLinked({“index.html”:“/CID<index.v2.html>”, “s.css”:“/CID<style.v2.css>”, “a.js”:“/CID<app.v3.js>”, “img”:“”/CID"})

4x linked CID = 36 * 4 + label length + bit extra space for prefix and cbor or pb tags.
__
I’ve gone through this IPIP-0512 docs.

it’d be really nice to have at least 256 bytes, ~512 bytes max is bit too much leg room to ask for.

Thank you for explanation.

Feel free to open new IPIP arguing why the limit should be increased, but the use cases does not look compelling beyond 128 bytes: when I wrote IPIP-512 nobody in IPFS ecosystem thought inlining bigger amounts of data is a good idea, everyone seemed to be happy to lock this down as low as possible.

If you open a new IPIP in ipfs/specs you may want to spend extra time explaining how it is not wasting someone’s elses resources [gateway operator’s] and/or abusing free public good gateways. (Not saying IPIP is not worth a try, just flagging it is a potentially controversial request, because there are more efficient ways of doing this, without outsourcing costs to public goods)

1 Like