We had ourselves a good time with Should we profile CIDs?, followed by hit sequel Should we profile IPLD? Those conversations led to the DASL work, that is generating a healthy amount of community interest, notably from people who hadn’t been in our community (or not active) before.
So let’s look at another problem dear to all our hearts: block size!
I don’t think that this is a case for a single profile or spec to solve all the things, but I do think that by hashing (wink wink) things out we can figure out some good work to do.
Let us know:
What are your pain points with blocks?
What are cool things that you’re working on or using that help with big blobs?
What would you like to see happening in the world?
What use cases do you have that are currently suffering from size issues?
It’s maybe worth specifying here what we mean by “big blobs” and level-setting a bit.
Traditionally, IPFS architectures that were DHT/Bitswap-centric set a block size maximum and chunked bigger files at the mandatory UnixFS abstraction layer, while both the iroh branch and the ATProto/DASL branch of the family tree don’t use a [mandatory] UnixFS abstraction layer and simply refer to inputs as blobs by the hash of the whole blob, punting chunking one layer up to the retrieval mechanism (BAO file → range-request in the iroh case, excluding large files from firehose and requiring manual PDS retrieval in the ATProto/BS case). Additionally, Filecoin uses a certain car file profile/config and PieceCIDs to deal with large inputs, but here as well commP makes its own mapping of chunks to pieceCIDs and no UnixFS layer is present, so this also complicates the indexing and subseting assumptions of the other systems. These very different assumptions and norms can lead to divergent mental models about what a “block” is, and what a reasonable cost or workaround is!
So while interop is not self-evidently a goal here, being explicit about which of these families of use-cases and assumptions you’re coming from helps diagnose and understand the pain points.
P.S. One thing i’ve been trying to understand is how bluesky/atproto can deal with video uploads, which are almost always way bigger than the UnixFS block size limit, bigger than the firehose block size limit, and (in longer/higher-res videos) butt up against even the effective HTTP/CDN limits, necessitating at least chunking at the HLS level. Can videos uploaded to, e.g., a non-BS AtProto Video platform or Peertube get migrated to other storage providers as a CAR file? What would that car file have to look like to make this make sense?