I Like Big Blocks And I Cannot Slice

Friends!

We had ourselves a good time with Should we profile CIDs?, followed by hit sequel Should we profile IPLD? Those conversations led to the DASL work, that is generating a healthy amount of community interest, notably from people who hadn’t been in our community (or not active) before.

So let’s look at another problem dear to all our hearts: block size!

I don’t think that this is a case for a single profile or spec to solve all the things, but I do think that by hashing (wink wink) things out we can figure out some good work to do.

Let us know:

  • What are your pain points with blocks?
  • What are cool things that you’re working on or using that help with big blobs?
  • What would you like to see happening in the world?
  • What use cases do you have that are currently suffering from size issues?
  • Whatever other thoughts you have!

And we’ll make things happen!

It’s maybe worth specifying here what we mean by “big blobs” and level-setting a bit.

Traditionally, IPFS architectures that were DHT/Bitswap-centric set a block size maximum and chunked bigger files at the mandatory UnixFS abstraction layer, while both the iroh branch and the ATProto/DASL branch of the family tree don’t use a [mandatory] UnixFS abstraction layer and simply refer to inputs as blobs by the hash of the whole blob, punting chunking one layer up to the retrieval mechanism (BAO file → range-request in the iroh case, excluding large files from firehose and requiring manual PDS retrieval in the ATProto/BS case). Additionally, Filecoin uses a certain car file profile/config and PieceCIDs to deal with large inputs, but here as well commP makes its own mapping of chunks to pieceCIDs and no UnixFS layer is present, so this also complicates the indexing and subseting assumptions of the other systems. These very different assumptions and norms can lead to divergent mental models about what a “block” is, and what a reasonable cost or workaround is!

So while interop is not self-evidently a goal here, being explicit about which of these families of use-cases and assumptions you’re coming from helps diagnose and understand the pain points.

P.S. One thing i’ve been trying to understand is how bluesky/atproto can deal with video uploads, which are almost always way bigger than the UnixFS block size limit, bigger than the firehose block size limit, and (in longer/higher-res videos) butt up against even the effective HTTP/CDN limits, necessitating at least chunking at the HLS level. Can videos uploaded to, e.g., a non-BS AtProto Video platform or Peertube get migrated to other storage providers as a CAR file? What would that car file have to look like to make this make sense?

Hm.. if we descope solutions that depend on some sort of data envelopes that facilitate storing metadata and chunking (e.g. UnixFS), then the lowest common denominator across all mentioned systems are opaque bytes of user data without any wrappers.

Such raw user data is identified by a CID with raw (0x55) codec and for discussions that include interop and backward-compatibility, it may be what mean when we say “big block”.

Probably? Bluesky uses raw blocks for blobs with images, video, and audio (docs).
One should be able to put a big raw block in a CAR just fine, no matter the size of the block.

The block size limits are enforced mainly on the data transfer layer (e.g. in Bitswap over Libp2p, or HTTP retrieval client, for perf/security reasons. If you are a semi-trusted logged-in user of a platform, some limitations could be lifted.

ps. a relevant prior art discussion where Adin elaborated on data transfer security when dealing with low trust p2p contexts:

In that context, my realistic hopes for future “big block” support in IPFS Mainnet are mainly around trusted setups that raise limits, or doing HTTP retrieval (once it is enabled by default in main IPFS implementations like Kubo/Helia). Security concerns could be solved with range requests within those opaque blobs (e.g. having HTTP spec for Blake3 and raw CIDs), or some other way (open-ended, thanks to HTTP content type negotiation).

1 Like

As @lidel mentioned there’s a lot of prior art on much of this in Supporting Large IPLD Blocks. Would recommend reading more for context on both why the limits exist and what can be done about it.

Some highlights:

  • In the IPFS context blocks are defined as the pile of bytes that you put through a hash function and get out a digest that you can put into a multihash. They are in general the minimally sized addressable chunk of data addressed in content addressable systems. You could define “block” differently but then the definition of “block size” would change too :upside_down_face:.
  • If you’re operating in a trusted/semi-trusted environment rather than a p2p one AFAICT there are no problems here and everything is already solved :tada:, just hash your bytes with your favorite hash function, fetch the bytes and validate them to make sure there was no corruption.
  • In p2p environments clients need to be able to get some proof that the 1EiB of data they’re downloading is actually what they’re looking for before downloading it. You can built trust over time, use merkle proofs, zk proofs, … but if your client doesn’t need these proofs then you’re likely already in a trusted/semi-trusted environment and you’re done
  • Some hash functions like Blake3, KangarooTwelve, BitTorrent-v2 piece hash, … already have pretty obvious merkle proofs to use.
  • The most commonly requested hash functions in IPFS-land for large blocks have traditionally been SHA-2 and SHA-1. There is a proposal for how to handle proofs for those large blocks. Unlike merklized hash functions like Blake3 you can’t fetch arbitrary slices, but if you looking to safely fetch and validate your large block it should be fine.

What are your pain points with blocks?

What use cases do you have that are currently suffering from size issues?

  • The primary reason it’s really no fun to not have small blocks is an inability to be compatible with other content addressable data out there
    • e.g. use IPFS tooling to address, find and fetch the large SHA-2 blocks that are present in basically every package manager
  • Other reasons (e.g. I don’t want to advertise the middle blocks of zip files, I don’t want my database / index of multihash → bytes to include the middles of zip files,…) are largely solvable without touching this problem directly. For example, in BitTorrent files can have individual hashes but in practice are referenced by the hash of the identifier of the “collection” of objects.

What are cool things that you’re working on or using that help with big blobs?

See linked discussion post (also GitHub - aschmahmann/mdinc: Tooling for incremental verification of Merkle-Damgård construction hash functions (e.g. SHA2/3)). While these aren’t recent I’ve been doing some occasional hacking in the space and if there’s interest would be happy to revive, update, etc.

What would you like to see happening in the world?

I’d like to be able to have my Docker container layers, package manager dependencies, … that already have SHA-2 hashes in them be able to dynamically discover and do p2p retrieval of that data so things don’t break if a given registry goes down. Similar support for a merklized hash function (e.g. Blake3, BitTorrent-v2 pieces, etc.) would be great too since those support verifiable ranges.

Whatever other thoughts you have!

The CAR format is fairly simple, but has a number of pretty annoying failings that people have been doing their best to ignore or work around (e.g. lack of EOF making it difficult to know if an HTTP streamed CAR file has cleanly terminated or not, lack of meaningful description in the header of the content contained, etc.) it’s really very unprepared to handle sending proofs (e.g. BAO, etc.) unless they look like blocks (e.g. UnixFS merkle proofs).

Given existing work around shipping around bundles of blocks in CARs it seems likely that having a container format that supports these other types of proofs would be convenient for those who might want to validate the data being sent to them before downloading all of it. Note: again if you don’t have this problem it seems there’s very little left to do just use the CARs.

It may also be interesting to consider where this type of work fits in alongside support of something like webseeds in IPFS and the recent UnixFS profiles work since webseeds effectively is about separating out the file bytes from the proof bytes and allowing them to be downloaded from different sources. The concepts of outboard and combined BAOs are somewhat similar (e.g. outboard BAO → webseeds-like, combined BAO → CAR-like).



(Note: not a separate post because discourse tells me it wants single large posts :upside_down_face:)

See linked discussions this is not about UnixFS, it’s about working in a peer-to-peer rather than trusted or friend-to-friend environment. UnixFS enforces no such limit.

It seems to me like interop of some sort seems to be a goal here. The CAR format is a pretty basic primitive that you’re looking to use for interoperability between bluesky and a set of other IPFS storage providers that could handle that data.

CommP is a similarly sort of interesting case. It operates similarly to Blake3 or BitTorrent-v2 in that it’s a merkle-tree. Currently my understanding is that anyone who does full piece retrievals from Filecoin will not get an accompanying merkle-proof and that if they wanted to they’d currently have to come up with a new format similar to the Blake3 outboard and combined formats. If people are interested in making progress in this space it seems like the Filecoin folks involved in the recent PDP work might have some thoughts (cc @zenground0)

I think this is either missing context or is not correct. The security concerns are tied to proofs. Doing ranged requests into large blobs doesn’t help with anything if we have no associated proofs or additional trust assumptions. However, the security concerns go away if we can get the proofs separately (e.g. webseeds / outboard-like) or in-band with the data similar to how Trustless Gateway Specification for CAR requests with entity-bytes allows for fetching a byte range and the corresponding proofs

1 Like