Hi I was thinking about different IPFS implementations w.r.t this diagram that is on the Kubo github.
I understand not every implementation will have the same boxes or components involved. So I wanted clarity on a couple things.
Firstly, UnixFS is very important in Kubo for chunking and creating the dags. Do all or many implementations also implement UnixFS? What options are there to replace this chunking and linking mechanisms?
Secondly, mechanisms to store blocks are important. (Ignoring Badgerds) FlatFS is important for saving those blocks to disk in their own files, but also having sharding mechanism to create a logical structure to find these blocks. Similar question to above. Do most implementations use a translated FlatFS as well? Or are these data store mechanisms tailored for the different use cases of the different implementations?
Those are great questions. I’ll answer these to the best of my knowledge.
It’s hard to answer this question without addressing what IPFS actually is.
So here’s my working definition of IPFS:
IPFS is a set of protocols that enable content discovery, routing and verified transfer using content addressing – the process of addressing data based on hash fingerprints.
In practice, this is achieved by representing data (including files and directories) using Merkle DAGs. UnixFS specifies how to represent (both serialised as binary and as structured data in a programming language) a filesystem of files and directories using these Merkle DAGs which includes chunking, tree (DAG) layout, and metadata. To do this it uses Protocol Bufffers.
(this paragraph is my reading of the history)
At some point, developers wanted to do more than just files and directories with IPFS and was IPLD was born. IPLD became a superset of UnixFS in so far as it generalised manyof the ideas from UnixFS and introduced more codecs (in addition to Protocol Buffers) like dag-cbor and dag-json.
Why was arguably IPLD useful as an abstraction? Because IPFS was pretty good at solving the problems of content discovery, routing, and transfer of content-addressed data, expanding the kinds of data it could move around would unlock new possibilities.
Moreover, it could allow IPFS to interoperate with existing content-addressed (hashed-addressed to be more specific) systems like Git, Blockchains, Bittorrent, Docker images etc. (see this proposal for some of the current constraints that limit this interoperability)
This is all to emphasise that while UnixFS is the oldest use-case for IPFS (and arguably the most useful since everyone’s familiar with files and directories), IPFS itself is all about content addressing and has many different approaches to:
Content discovery and routing - the DHT with Kademilia and more recently indexers and delegating routing
peer-to-peer transfer of content-addressed data: HTTP, Bitswap and potentially other new protocols that will be developed.
When it comes to implementations, it’s useful to think about IPFS implementation through 3 different key properties:
For IPFS to run everywhere and be available to every networked device, e.g. automation robots in a factory, mobile phones, browsers, and large data centres, we need different implementations of IPFS.
Another is low-powered and resource-constrained devices for which ipfs-embed was developed.
They don’t have to, but UnixFS is the most common content-addressed data supported by IPFS so the answer is likely yes. Alternatives to UnixFS include some of the stuff offered by IPLD like dag-json and dag-cbor (the comparison here isn’t direct), or WNFS which improves on UnixFS by introducing hierarchical encryption, metadata, and other features.
How to store blocks is an implementation detail – a very important one – but still an implementation detail. Since the current block limits are 1-2 MB in IPFS, any key-value store capable of storing such sized blocks could be used to implement a block store.
We’re in a phase where opinionated implementations are emerging, and as a community we’re working to let the best ideas bubble up to the top. The best way to think about the landscape of implementations is to find the implementation that comes closest to meeting your needs, and work with that implementation. If none of them come even remotely close, maybe IPFS isn’t the right fit for a given project, or maybe you need to start your own implementation!
Need to call to IPFS as a library within a go project? Kubo is going to be the way to go. Do you not care about UnixFS at all & just want to import IPLD data into Postgres? Then you might be best off consuming crates instead of going “full IPFS”. The point is, every project is different, and we’re trying to discover patterns that work.
Answering your concerete questions from the perspective of an iroh maintainer:
Do all or many implementations also implement UnixFS?
We still think the best way to explain what iroh does & does not support is to just compare it to Kubo, because it’s the reference implementation. We do that here: https://iroh.computer/docs/iroh-and-kubo/
We think supporting UnixFS is a de-facto part of “doing IPFS”, so we support it.
Do most implementations use a translated FlatFS as well?
We don’t, and like @danieln think it’s an implementation detail. We think the kubo diagram you’ve cited leaves a lot to be desired, as an example, iroh builds an index of graph data on ingest for efficient lookups later that don’t have to touch blocks, which we could not accomplish if we honored the blockstore interface. This means that migrating from kubo to iroh will require exporting & importing. Thankfully we have CAR files for this.
Lastly, one small nit about RocksDB @danieln: RocksDB is a C++ project