Supporting Large IPLD Blocks
IPLD Blocks
An IPLD block is a sequence of bytes that can be referenced by a CID
Block Limits
Current IPFS implementations recommend users create blocks ≤1 MiB and that they are able to accept and transfer blocks ≤ 2MiB.
There is no hard block limit in IPLD or specified across all IPFS implementations, although the above are generally good guidelines for the ecosystem
Why it’s sad that we have block limits
Backwards compatibility with existing hash-linked data structures where the block limit was either chosen to be larger than ours (2MiB), or even when there wasn’t one chosen at all
People have been using hashes as checksums for Git commits, ISO downloads, torrents, antivirus checks, package managers, etc. for a while now and many of those hashes are of blocks of data larger than a couple of MiBs. This means you can’t reasonably do ipfs://SomeSHA256OfDockerContainerManifest
and have it just work. Similarly you cannot do ipfs://SomeSHA256OfAnUbuntuISOFromTheWebsite
and have that work.
Ultimately, the block limit introduces a limitation on the set of hash-linked data structures representable by IPLD such that many existing structures. On the IPLD website we have the line
IPLD is the data model of the content-addressable web. It allows us to treat all hash-linked data structures as subsets of a unified information space, unifying all data models that link data with hashes as instances of IPLD.
Which today looks like
IPLD is the data model of the content-addressable web. It allows us to treat all hash-linked data structures, with blocks at most 2MiB, as subsets of a unified information space, unifying
**allsome** data models that link data with hashes as instances of IPLD.
Why block limits?
A major reason why block limits exist is to enable incremental verifiability of data. For example, if I was given the SHA2-256 of a 100GB file it would be a shame to download the whole 100GB just to find out it was the wrong file. This kind of attack effectively makes incremental verifiability required in order to enable peer-to-peer (as opposed to a more trusted friend-to-friend) transfer of content-addressed data.
From what I can tell this has historically been the reason people have argued that we should have both implementation and ecosystem-wide block limits. However, more recently there have been pushes and proposals in the IPFS ecosystem that enable incrementally-verified transfer of large blocks in many scenarios.
This has resulted in some new arguments in favor of block limits that are that having some block limit is important in building IPLD tooling since the fixed size allows for various assumptions and technological simplifications to be made that otherwise could not be made.
Underlying both the security and tooling arguments is the implicit argument that it is helpful for users and our ecosystem that data be as transportable as possible and that it would be a shame if one set of IPFS applications chose 2MiB limits while another chose 10MiB since that 10MiB data wouldn’t be compatible with the 2MiB limited application.
Solving Block Limits - Data Transfer Security
As described more in depth in the presentation on this (slides), as well as in an early proposal for a number of common use cases we can deal with the security issues at the data transfer layer by leveraging some of the properties in common hash constructions
Merkle-Damgård Constructions
These include common hash functions like SHA1/2/3. In short, if you are able to assume freestart-collision resistance rather than only collision resistance (which appears safe for SHA2/3 and SHA1s collision resistance is in any event problematic) then you can download even a 100GB block in an incrementally verified way if you download if it backwards.
The start of this downloading backwards process looks like:
Merkle Tree Constructions
These include some newer hash functions like Blake3. In short, just like how a UnixFS file is incrementally verifiable due to being a merkle tree so too can you construct a hash function that is a merkle tree and also incrementally verifiable.
Solving Block Limits - IPLD Tooling
The argument that IPLD is even a reason for us to have block limits is new to me. The first time I heard this argument was at IPFS Thing 2022. Overall the idea is that it makes creating IPLD tooling more complicated and increases the probability that some data will work nicely in some IPFS implementations but not in others.
Historically having block limits be a part of IPLD has been rejected (e.g. in Max node size limitations · Issue #48 · ipld/ipld · GitHub and the maximum block size should be specified · Issue #193 · ipld/specs · GitHub), so this argument either implies that there should be changes to the IPLD specs or that there should be an IPFS-wide block limit due to IPLD tooling while still not having a block limit in IPLD itself which seems strange.
Below are the highlights of the impacted IPLD components:
Codecs
Supporting IPLD Codecs on blocks >2MiB will have some pain, but it’s nothing new:
- Painful: Instead of being able to work with the serialized data all at once it has to be in pieces which might not be possible in certain environments for certain formats. This could lead to some data being not readable in some contexts, but not in others.
- For example, building a streaming JSON decoder that can get useful information out of a single 100GB object might be painful so my DAG-JSON implementation might decide to only support handling blocks up to 2MiB here so as to be both simpler to implement and not run out of memory during processing. This means some IPFS implementations would be able to process some DAG-JSON data, but not others.
This bad thing about some data being not readable in some contexts, but not in others is not new:
- Not all codecs (or even hash functions) are implemented in every IPFS implementation
- For any codec you would be concerned about decoding a 100GB block there could be an ADL that does the same across a graph of a million 1MB blocks and would fit the current model
It’s also the kind of thing we automatically have to deal with in a world of remote dynamically loaded codecs and ADLs proposed in some of the WASM + IPFS integration proposals.
- Any sort of dynamic loading involves running code from an untrusted source in a sandbox with some kind of resource limiting. If the resource limiting is not a globally agreed upon number then we end up with the same issues as having per-peer block limits, which is not particularly different from no block limits at all
- While we could agree on global resource limits here, thus far none of the dynamic loading proponents I have spoken with are in support of having global resource limits
This is also an area that we can move slowly on and only expose pieces as we need them. Most large blocks tend to just be piles of bytes rather than more structured data. This means that if there were concerns here we could limit the scope and decide either that:
- The only IPLD codec that can apply to a large block is the raw codec which refers to plain bytes.
- The only IPLD codecs that can that can apply to a large block are ones that result after decoding into streamable bytes. This is similar to the above, but also allows for codecs that are simple transformations across bytes.
Data Storage
To support large blocks, it is likely that block storage systems will need to differentiate based on large and small data, especially if there are any transport-specific optimizations they’ll want to pre-compute locally.
For example, whether for a Blake3 or a large SHA256 the block storage may want some level of indirection to separate out the multihash → collection of chunks to load to reconstruct the data, as well as multihash → information needed to efficiently make the data available to the transport layer.
This is somewhat unfortunate, however:
- Many implementations already have these types of indirection layers in their key-value stores
- Boost and lotus have multihash → set of CAR files + offsets for the block
- Kubo’s filestore/urlstore has multihash → the location, offset and range where the block might live in at a file or URL
- It seems very likely the transport level optimizations will start to appear in data storage anyhow
- Git has pack-files for their trusted transfer protocol
- Synchronization protocols like dsync and ones based on invertible bloom filters will likely track the collections of blocks they are synchronizing
Data Transfer
More so than with storage the cost-to-get-started of building a new generic IPFS data transfer protocol increases as a function of increasing the block-limit.
While not every IPFS implementation even supports the same set of hash functions, the added complexity to support large blocks of data increases what it takes to build a new highly compatible IPFS implementation.
Groups that have chosen to rely on only a single data transfer protocol so far, will have to start adding support for new data transfer implementations if they want to support large blocks.
Alternatives - Just leave it alone
The alternative to removing the block limit generally ends up being to just leave it at 2MiB. Sure, we could increase it to 4, 10, 1000, etc. but either way you end up running into the same sorts of tradeoffs that have already been mentioned around security, ease-of-implementation, resource consumption, helping users make the “right decisions™” etc. so while we have to choose an arbitrary number sticking with the one we already have seems reasonable.
By leaving things as they are we continue to be in a world where most tooling in the IPFS ecosystem is unable to deal with a lot of the content addressable data that is already out there in places like:
- Programming language package managers
- Operating system package managers
- Docker container registries
- Blockchain storage systems like Arweave and Sia that chose larger block limits
- Git
- BitTorrent
- …
While IPFS tooling can work with these formats when individual blocks are small and that’s sufficient for some use cases, there’s a whole lot of large block data that’s out there and sometimes the small percentage of something like Git repos or BitTorrent files that are not compatible is enough to hurt the compatibility story more generally.
The major alternatives then for interoperability frequently look something like:
- Have a trusted map of SHA256 of a 1GB Ubuntu ISO → UnixFS representation of that data and use that trusted map for lookups
- This requires a trusted map which is generally not what we want to do with our self-certifiable content-addressed data
- Convince people distributing graphs with large blocks that they should also/instead distribute graphs with small blocks
- While 1MB blocks seem generally better than 100MB blocks convincing people to change their patterns is more difficult. Why not let them see some of the benefits of interoperability with the rest of the IPFS ecosystem and then let them change things to get additional benefits, rather than forcing them to do all or nothing, or use something like a trusted mapping?
Removing Block Limits - Let’s do it!
We are now at a point with block limits where the question is no longer if we can do it, but should we. I think the answer here is yes! Unlocking interoperability with other content-addressed systems, where we safely can, seems like a big step forward that’s worth the additional cognitive overhead and technical complexity of needing to handle it.
While I don’t think building new systems with large blocks of data as the smallest units of data is a particularly good idea due to issues like the complexity of building efficient streaming parsers, data transfer protocols, etc. I’d like to see us build out compatibility even with those who disagree while also providing documentation and examples to help them understand why and how to make better data structures in the future.
For example, while there is now a proposal for a safe way to download a 1GB SHA256 block safely it’s poorly suited for things like video because the data must be downloaded backwards and there is no way to seek into the middle and start without reading everything from the end up until there. Helping people understand the tradeoffs and alternatives to make their own choices seems better than dictating our own.
The underlying premise of keeping block limits around in the ecosystem is that we will need to build more sophisticated tooling in order to deal with these large blocks which will end up increasing the complexity of new IPFS implementations that wish to be compatible with existing ones. While this type of compatibility and ease of implementation is admirable I think restricting what users can do in order to make them do the “right” thing isn’t really the “style” of the IPFS ecosystem.
We like to be the open tent of content addressing that says it’s fine to bring whatever data transfer protocol you want, whatever content routing system you want, or whatever content addressable data structures you want and still be part of the IPFS ecosystem. As it is today, not every IPFS implementation has tooling to work with all types of data equally.
If I had to choose a dividing line in the sand here around what types of data should be “in bounds” for IPFS it would be difficult. However, I think it would be that it should be possible for other implementations running in a variety of platforms, with a variety of threat models to all be able to get and process that data. However, just because they can get and process the data it doesn’t mean they have to.
References: