Should we profile IPLD?

adin · October 6, 2024, 5:35am

IMO this is not a thing, because it’s not the right framing of the problems in IPLD-land at the moment. I’m going to take a step back to be more meta since IMO it helps us get to what is IMO the problem (and a possible solution) for what could be next for IPLD.

TLDR: If this is as far as you get and you want something actionable I think we need a way of representing IPLD beyond UnixFS in URIs.

What is IPLD anyway? My understanding is that the current setup has it in a similar space to libp2p, multiformats, and IPFS which is basically a core concept + a pile of specs (not talking anything that fancy, but the kind of thing you’d expect multiple implementations to have rather than just details of a specific implementation) that you may choose to adopt from:

multiformats:
- core concept: your format today will not work for everything, so for the low cost of a few bytes let’s describe what we’re talking about and allow for evolution / integration of new formats
- specs: multihash, multibase, multiaddr
libp2p:
- core concept: p2p networks come in many shapes and sizes but frequently solve similar problems, separating out the pieces allows for evolution of networks as well as interoperability between them
- specs: identify, peerIDs, multistream, libp2p’s specified usage of TLS, Noise, Yamux, Mplex, QUIC, WebTransport, …, gossipsub, kad-dht, and many more
IPFS
- core concept: It should be possible to address data by what it is, rather than where it is and since the data is verifiable it can come on a variety of transports from a variety of sources.
- specs: Amino DHT, UnixFS, IPFS HTTP Gateway, IPNI, …
  - Note: what falls into the “specs” bucket of an open project like this isn’t always a clear line. The IPNI specs aren’t listed in https://specs.ipfs.tech, but they seem relevant to the IPFS ecosystem. Similarly, it could go either way if UnixFS is considered an IPFS or IPLD spec.
IPLD
- core concept: content addressable data comes in many shapes and sizes but frequently work in similar ways to solve similar problems, if we can have shared tooling across different content addressable data structures we’ll make it easier to build and evolve systems as well as have interoperability between them
- specs: IPLD selectors, IPLD schemas, and some codecs (dag-pb, dag-json, dag-cbor, dag-jose, git, the eth codecs, …), maybe CAR format

and so IMO the main failing of IPLD today is that we just don’t have enough compelling specs and tools to make the core concept of “reusable content-addressable tooling” really all that impactful.

Selectors: IMO not friendly enough to work with and/or not powerful enough to justify its use, but opinions aside I have not heard of much traction here
Schemas /: Seem to have proven useful to folks in a number of scenarios to describe content addressable data even when in practice only a single “codec” is used for the data since people are building unique data structures out of the basic building blocks. In practice working with multiblock data structures (e.g. large maps, files, etc.) here is still painful.
Codecs /: There are a number of these and they do perform translations into the IPLD Data Model, but IMO the utility of moving through the data model isn’t currently enough. The main things I see people do with codecs are:
1. Encode their data → Not really more effort than just reusing code for a single data encoding (e.g. dag-cbor)
2. Ask for an entire DAG (i.e. follow all the links) → Good, but people frequently need smaller sub-DAGs
3. Convert the individual blocks or components from a less readable binary format into a more readable format → Nice, although probably not valuable enough on its own. It also has issues:
  - For some data like the Filecoin HAMT, the Solana Yellowstone data, etc. the data is not readable even in this form. You need at least a schema transformation and even that might not really be enough. For example, decoding a HAMT is a multiblock data structure you might want to see as a map rather than the internals. Another is that multiaddresses have a binary and text format, converting from dag-cbor to dag-json will not handle that conversion and so you’ll either have unreadable base64 dag-json or verbose text in dag-cbor.
CAR format : Despite its issues CAR(v1) has been adopted by a number of projects and seems to do the basic job of moving around content addressable data independent of the formatting. It’s a bit of an outlier as an IPLD spec in that its job is really at the block / multihash layer (similar to protocols like Bitswap) where the most compatibility lives rather than anything fancier (e.g. IPLD Data Model) it’s probably gained the most adoption of any of these specs.

Where do we go from here?

IMO IPLD needs at least one of two things to build momentum:

Great tooling for building new content addressable formats (or reusing existing ones like dag-cbor and IPLD HAMTs / AMTs)
A compelling reason for someone with an existing data format to write some glue code that allows them to leverage other existing tooling from the IPLD ecosystem

People seem to have spent a good deal of time thinking about #1 in a way I personally don’t feel has been that successful. It could be valuable, but my opinion is that the second is more valuable and under explored. This is because it allows people to show up to the ecosystem “later” and still get lots of benefits without a huge rewrite and because it’s easier to build a good abstraction when you already have multiple implementations that you’d want to fit the abstraction rather than building the abstraction and hoping the implementations will fit nicely when they’re built in the future.

In many ways this is similar to my post from 2022 . If there’s limited value from being able to work with UnixFS, BitTorrent, Git, Ethereum, Filecoin, etc. data via the same tooling are we really equipped to build tooling for the next 10 formats to share?

A concrete proposal could involve creating an IPLD URI scheme that allows interacting sanely with not just UnixFS data, but also others like BitTorrent, Filecoin, Solana Yellowstone, etc. with the next stop being to make IPFS and IPLD implementations that can safely handle large blocks coming from p2p sources (see the linked post for more info).

On a less technical note, I think IMO the IPLD docs website is really in the weeds and seems to say “welcome to IPLD, here’s a brain dump of everything we’ve discovered in trying to create content addressable data structures for when you decide what you’d like to do for yours”. That’s certainly interesting and content I’ve pointed people at when they were in fact making their own new formats or trying to understand tradeoffs in approaches, but for many people I don’t think that’s what they’d want to see as the front door. In that way having profiles or personas to help guide people to what they actually want to see might be useful.

I recognize this post is crazy long already, but for those interested Juan’s talk at FIL Dev Summit 2023 on the Filecoin Data Layer has a number of interesting / controversial discussions. Here are a couple where I’m also chiming in around people trying to make existing formats work with IPLD/IPFS and what does the spectrum of compatibility with IPLD/IPFS mean for these existing formats

Topic		Replies	Views
Should we profile CIDs? Protocol	40	793	June 27, 2025
Lean IPFS implementations Ecosystem and Usage use-cases-and-apps	3	94	December 25, 2024
IPLD and how to create links Help	5	877	January 29, 2021
What codec use? Help ipld , codecs	3	300	August 30, 2022
Metadata about IPLD links, particularly link importance Protocol ipld	1	494	January 18, 2023

Should we profile IPLD?

Where do we go from here?

Related topics