Supporting IPLD tooling in URIs

Supporting IPLD tooling in URIs

If you’ve come from IPLD and IPFS - A Pitch for the Future :baseball: you already know some of the benefits of supporting IPLD tooling in URIs. For those of you starting off here a quick rundown.

  1. If you want to render files from hash-linked file systems other than UnixFS you need something like ADLs to be able to help you interpret the multiblock data structure as a file. ipfs:// currently has hard coding to try interpreting data as UnixFS
  2. If you want to work with directories (or file metadata) from hash-linked file systems other than UnixFS you need ADLs to help interpret the multiblock data structure into the model/interface you’re using to work with directories (or file metadata)
  3. If rather than being interested in files or directories you are interested in other types of hash-linked data formats describing things with JSON-like properties (maps, lists, strings, integers, floats, booleans) aside from just bytes and you want to use URIs for compatibility with browsers or other URI based tooling

For those interested I would recommend taking a look at IPIP: Add IPLD Gateway Specs by RangerMauve · Pull Request #293 · ipfs/specs · GitHub and the linked issues (e.g. Intiial exploration report for IPLD URL Scheme by RangerMauve · Pull Request #195 · ipld/ipld · GitHub) as well as https://discuss.ipfs.io/t/2022-07-13-data-and-ipfs-models/14635/6 and the recorded talk.

ipfs://

Before going more into the weeds of supporting IPLD in URI schemes I’ll convey some of the oral history I’ve heard around ipfs:// and the problems I think the URI scheme has today that we need to deal with or address in some way.

Note: Oral histories can sometimes be a bit shaky and my GitHub archaeology only covered so much. If you have more info/context drop a comment :pray:

When go-ipfs, the first implementation of IPFS and which was recently renamed to kubo, first existed there was only one codec and not really any IPLD. That codec was dag-pb and the only thing that worked with ipfs:// was UnixFS. The resolution code was approximately:

  • Is it dag-pb, and does it structurally seem like UnixFSv1, then it’s UnixFSv1 and we should work with and return the data interpreted that way
  • Otherwise error

There had been a desire for another iteration of UnixFS, i.e. a UnixFSv2, that would be based on something like CBOR rather than the dag-pb format. The idea was that the way ipfs:// would evolve to handle both UnixFSv1 and UnixFSv2 was with some checks that looked vaguely like:

  • Is it dag-pb and does it structurally seem like UnixFSv1, then it’s UnixFSv1
  • If not, is it dag-cbor and does it structurally seem like UnixFSv2, then it’s UnixFSv2
  • Otherwise error

However, in the meanwhile with the introduction of CIDv1 and an expansion of IPLD a bug slipped into the ipfs:// resolution code in go-ipfs which allowed traversing any IPLD data model object as long as the final object you landed on was a UnixFSv1 object. This means the resolution code looks more like, for each path segment:

  • Is it dag-pb, and does it structurally seem like UnixFSv1, then it’s UnixFSv1
  • If not see if it’s a valid IPLD path segment
  • Otherwise error
  • Note: If the last element in the path is not UnixFSv1 then also error

Some projects started leveraging this bug to wrap their UnixFS data with dag-cbor wrappers which makes fixing this bug … problematic. As a result, any prior ideas where we could upgrade ipfs:// to support new things by asserting dag-pb implies UnixFSv1 (true basically everywhere), and anything else implies the new scheme have somewhat been thwarted due to not wanting to break any existing user data.

IPLD tooling in URIs

We’ve now reached a point where not having a consistent way to use URIs to describe non-UnixFS IPLD data is getting in the way. It prevents us from copy-pasting around links to BitTorrent data or Git files and be able to have IPFS HTTP Gateway tooling available to us, similarly even our CLI tooling is lacking some of the nice mechanisms to work with anything other than UnixFS or data contained within a single IPLD block.

Generally speaking we have two pieces that we need in order to figure out how to represent IPLD data in URIs:

  1. Define the semantics that we want for IPLD in URIs (e.g. if/how to support using tools like Schemas, ADLs, Codecs, Selectors, Pathing, etc.)
  2. Figure out how to indicate that we want to use the new semantics instead of the old ones

IPLD Semantics

Likely there are many options here for what could be the correct semantics and how to express them. Some lessons I think could be useful here include:

  1. For a large number of cases using some ADLs/Schemas along with path-like semantics is sufficient and is fairly readable in the context of URIs as compared to more powerful tooling like Selectors
    1. Selectors can be quite useful when trying to describe selecting multiple logical elements in a DAG, or when running other custom structure code is impossible. However, they may be overkill in other contexts.
  2. People need ways to signal which code should be used to work with their data, e.g. these blocks are a big map, this is a BitTorrent directory etc. There are good reasons why these signals might live inside the data structure or outside of it (e.g. in the URI). Even among those who are proponents of putting signals inside the data structure there is no clear “one way” to do it.
    1. This seems to imply that it may be useful to allow signals outside of the data, and that it may be useful to allow signals outside of the data to tell you how to interpret signals inside the data
    2. in-data signaling example:
      1. The root of your BitTorrent file is wrapped in an object like { "@type: BitTorrent-File", "data" : <bittorrent-infohash-cid> } and so ipld://<wrapped-object-cid> loads a file
      2. Nice because wrapped-object-cid has a binary format that can be passed around and keeps the type information
      3. Unfortunate because if I wanted to get at data inside of the object other than the file bytes (e.g. the file name inside the BitTorrent infodict) it would be difficult without additional tooling
    3. out-of-band signaling example
      1. You indicate that your CID is a BitTorrent file in the URI so something like ipld://<bittorrent-infohash-cid>/[ADL=BitTorrent-File] loads a file
      2. Nice because it’s fairly obvious what transformations are happening to the data when you look at the URI. Also, it makes it easier to get other pieces of information out of the data like the file name with ipld://<bittorrent-infohash-cid>/name or to visually see transformations on the data like ipld://<bittorrent-infohash-cid>/[ADL=BitTorrent-File]/[Codec=JSON]/foo and extract JSON data from inside of a BitTorrent file
      3. Unfortunate because these transformations are more verbose in text and because that means it’s harder to have self-describing trees of data. The verbosity increases in nested trees of data where multiple transformations will have to be spelled out in the URI where they’d otherwise be contained in the data

Indicating new semantics

Since the original upgrade plan appears to be problematic it’s time for us to figure out where to hang our new upgrade hook so that we can keep existing data operating as is, while allowing for the new types of semantics we’d like to add support for.

At a high level the two major approaches I have seen so far are:

  1. Have a new URI prefix like ipld:// that has all of our new semantics supported
  2. Take our existing URI prefix ipfs:// and find something that’s reliably illegal in the old semantics to indicate that we should use the new semantics. Examples include:
    1. ipfs:///ipld/bafyfoobar/...
    2. ipfs://bafyfoobar.ipld/... (requires insisting that no multibases with . in them are valid for use in ipfs://)

While each have their advantages and disadvantages in terms of usability, adoption, etc. they are each a step forward from where we are now.

ipfs:// wasn’t perfect in v1 and whatever comes next is unlikely to be perfect either. Ideally whatever we decide for v2 should have enough flexibility in place for us to perform further upgrades as we discover new use cases for IPLD and IPFS.

SGTM, let’s ship :ship:?

I like that attitude! If you’re interested add some thoughts and comments to IPIP: Add IPLD Gateway Specs by RangerMauve · Pull Request #293 · ipfs/specs · GitHub or make a new proposal in the specs repo (or below).

There are still details to reach agreement on and there’s good reason to be cautious for these types of changes. However, if we go into this endeavor knowing that whatever we choose will probably have to continue evolving in the future I don’t think we should let it stand too much in our way to start unlocking the potential of representing more than just UnixFS in URIs.

4 Likes