IPLD and IPFS - A Pitch for the Future ⚾

IPLD and IPFS - A Pitch for the Future :baseball:

For those of you that don’t know me :wave:. I’ve been working around the IPFS project for almost 4 years now and am one of the maintainers of Kubo (formerly known as go-ipfs). If you’re a newcomer to the IPFS project you’ve likely been bombarded with lots of acronyms and descriptions of how making the web more content addressed can increase the resiliency and performance of the web such that things that were never possible before all of a sudden can be.

Most of the time when users experience IPFS it’s through the lens of a format called UnixFS which is a flexible content addressed format for referring to files and directories which are things we tend to understand and interact with regularly. If you do more digging around IPFS you’ll notice that the guts (data storage, network protocols, etc.) of almost any current IPFS implementation cares very little about UnixFS but cares a lot about IPLD.

The TLDR on IPLD is that it provides tooling for working with hash-linked (i.e. content addressable) data structures so that not every content addressable format needs to have its own tooling from top to bottom. For example, it’d be really sad if we needed to custom create new tools and protocols from scratch for every hash-linked data format out there. AND YET this is actually the world we live in without IPLD. Each of Git, BitTorrent, Dat, Arweave, Bitcoin, and Ethereum have their own data formats, discovery mechanisms and transfer protocols that aren’t particularly reusable by any of the others despite the underlying data (e.g. Git commits) being content addressed.

Part of the realization of IPFS is that in a world where data is content-addressed does it really matter where the data came from? It could come from the author, from my locally cached storage, or from any number of machines storing a copy of the data. So couldn’t we make IPFS support tooling for moving around not just UnixFS data but various others as well? Couldn’t we start leveraging the best ideas of existing systems to allow multiple ways of discovering, retrieving and serving content-addressed data?

UnixFSv2 - Will it save the day?

TLDR: A new data format won’t help us, but we have something better available :tada:.

For as long as I’ve been involved in the IPFS project there have been people who have wanted something to change about UnixFS to make it better such as more types of metadata, versioning, signing, encryption, different graph structures, not using protobufs, … For a number of these ways of making UnixFS “better” there are a few ways to do it and some are better suited to some use cases as compared to others. This means at the end of the day trying to figure out which use case is more likely to drive growth in the IPFS ecosystem.

But why pick a winner here? If we look around we already have a variety of systems that have chosen to do content-addressing of files and folders. Git, BitTorrent-v1, BitTorrent-v2, Dat/Hyperdrive, checksums of files using linear hash functions (e.g. SHA-2), checksums of files with tree hash functions (e.g. Blake3), UnixFS, WNFS, Peergos, … Surely at some point we should realize that there are multiple ways to construct hash-linked filesystems, and yet most programming languages have developer tooling to enable coding against standard filesystems pretty reasonably despite the breadth there too (ext4, btrfs, xfs, zfs, FAT32, NTFS, HFS+, …)

So maybe the idea for the future of working with filesystem data in IPFS isn’t to make the next best format, but to describe a filesystem interface that’s workable with the various content-addressable filesystems we already know exist and modeled off of the interfaces we use regularly to interact with our local file systems.

Hmmm… an interface for unifying different hash-linked file systems, can I use IPLD for this?

Yes! In fact I claim that if you can’t do this with IPLD tooling that we are missing some important pieces of the story and leaving a lot of value unclaimed.

Let’s take a look at how close we are to be able to use IPLD for this. As an example we need:

  1. A way to describe “this is a BitTorrent directory and I want the file Koala.jpg in that directory”
    1. We have no mechanism inside the IPFS URI (ipfs://) to support anything like this currently
    2. However, IPLD has this mechanism called selectors (https://ipld.io/specs/selectors/) that enable this that we’d need to figure out how to get into a URI.
      1. Selectors while powerful are perhaps more than we need for most file system operations. Given most file system usage uses paths, a simpler path-based syntax might be appropriate (as described in IPIP: Add IPLD Gateway Specs by RangerMauve · Pull Request #293 · ipfs/specs · GitHub)
  2. A way to plug in code relatively painlessly that describes how to interpret a BitTorrent directory as our generic file system interface
    1. The IPLD Codec and ADL abstractions fit this case well
  3. A way to discover the data
    1. Data discovery, or content-routing, is mostly about knowing which bulletin boards to check that say “Alice says she has your data”. An implementation can use some of the mechanisms built with IPFS in mind like the IPFS Public DHT, Network Indexers, etc. but there’s no reason an implementation couldn’t consult BitTorrent’s mainline DHT or some trackers either whether directly or by putting those queries behind a Reframe endpoint
  4. A way to download it
    1. This content-addressed file system is made up of the same primitives used by UnixFS and the other content-addressed file systems listed. As a result any transport you use to move around UnixFS data should be fine here such as Bitswap, GraphSync, sending around CAR files via HTTP, USB, etc.
    2. Note: Most IPFS systems will not transmit individual blocks of data larger than 2MiB which means the content-addressed file systems that have blocks that big will be problematic to work with. For BitTorrent this is rarely a problem, but in other systems it can be. However, that “rarely” can be quite important for some users who want to port their existing systems towards IPFS tooling

I’m sold what do we need to make this happen?

  1. We need a spec for a URI compatible mechanism for describing what type of data we are pathing through instead of making assumptions about UnixFS
    1. See Supporting IPLD tooling in URIs
  2. For fuller compatibility with existing systems we need to be able to safely support large blocks
    1. See Supporting Large IPLD Blocks
  3. If we don’t want to rewrite codecs and ADLs in every language under the sun we should support something like WASM along with libraries that have FFI bindings
    1. If you’re interested see GitHub - ipld/wasm-ipld: Tools and examples of IPLD codecs and ADLs in WASM callable from hosts as well as some of the talks and discussions from IPFS Thing 2022 (e.g. the IPFS and WASM Track)
    2. wasm-ipld talks
      1. IPLD Focused: https://www.youtube.com/watch?v=Hc-5WJxBkpI (slides)
      2. WASM Focused: https://www.youtube.com/watch?v=Z6ZLawrc94g (slides)

Those seem like some hefty items, how close are we?

Well here’s a running demo of a kubo node with support for a proposed IPLD URI scheme loading a picture from inside of a BitTorrent directory (the CID in the URL is a BitTorrent infohash) using codecs and ADLs written in WASM. So, while we as a community still have some decisions, exploration, specs writing, and of course coding and debugging ahead of us we’re closer than we’ve ever been!

If you’re interested in seeing this happen come hang out in the IPFS chat channels or make proposals here and bring your use cases, pull requests, demos, etc. with you!