There are lots of hash-based systems out there, descendant from Git, bup, Bitcoin, etc. Is it possible to link directly to content in those systems from IPFS?
It is indeed! Check the dag examples here: https://github.com/ipfs/js-ipfs/tree/master/examples/dag it shows how to get through some Ethereum blocks.
Answering own question with results of some research. @daviddias please point out anything I got wrong.
Yes, CID (content id) is the name of the format IPFS uses to link to hash-based content.
A CID is a string of bytes that specifies:
(a) the encoding of the cid itself
(b) the content type of the thing being linked to (e.g., image, git object, etc)
© the multihash of the thing being linked to
It is possible to add support for new content types by:
- adding a new entry to the list here if necessary for the hash function: https://github.com/multiformats/multicodec/blob/master/table.csv#L35
- adding a new entry to the same table if necessary for the target content type
- implementing a “resolver” which resolve a cid to a block
Two interesting things here:
-
The block that the cid resolves to doesn’t even need to be in IPFS. It can be in some external system. The example @daviddias links to above shows an example of linking directly to data in the Ethereum blockchain.
-
The “resolver” doesn’t need to be deployed to all IPFS nodes, only those doing the requesting. The resolution happens before talking to other IPFS nodes.
@aboodman a lot of what you just outlined is possible because the CID implementation uses IPLD. IPLD is a data model that allows you to treat the entire universe of hash-linked data as a single information space. This is why you can have CIDs for non-IPFS content and can build tools like git-remote-ipld, which lets you add git repositories to IPFS and uses the git hash as the IPFS identifier for stuff within that repo – it’s resolving git content over IPFS protocol using the original git hashes. [see thread discussing git-remote-ipld
Read more about IPLD at https://ipld.io
How does IPLD handle deduplication?
Thanks. Is it fair to say that CID is a serialization format for IPLD?
How does IPLD handle deduplication?
I don’t understand this question. IPLD is a data model. Can you explain a scenario where you would want to use IPLD to handle deduplication?
My bad, I misunderstood the concept. It’s only possible to link to foreign data, not from foreign data?
Not quite. CID is an identifer scheme that happens to use IPLD. The IPLD Spec includes info about serialization formats.
IPLD is an abstract data model. It basically says “all Merkle DAGs have basic properties in common, so let’s use those common properties to interlink DAGs across different systems.” This allows you to treat the entire decentralized web – and all DAGs anywhere – as one giant inter-linkable information space. This makes any system that uses IPLD really powerful and dynamic, opening lots of paths for interoperability.
If a system incorporates IPLD into its code, it can link to other hash-linked data in any system. So if two systems both use IPLD they will be able to bidirectionally reference each other’s data. If only one system uses IPLD and the other does not (ie. IPFS uses IPLD but git doesn’t) then your IPLD-based system (IPFS) can point into the other system’s data (git) but that other system won’t be able to point back.
The benefits of incorporating IPLD into your code base are huge, and the cost/overhead is minimal. That makes me suspect that we will see all hash-linked systems gradually incorporate IPLD, which will make the decentralized web exponentially more powerful and flexible.
That link is now broken, I assume this is the same with a new name: https://github.com/ipfs/js-ipfs/tree/master/examples/explore-ethereum-blockchain
Github needs content addressable links!
If I understand correctly, this is allowing to use the identifiers of the other system to address the content, but the content still needs to be served by an IPFS daemon to be accessible, yes?
You are correct, it is the same tutorial just with a different name