What is the formula to calculate cid from sha256 hash

For a new project i am using libIPFS to provide IPFS interface inside a C++ application. It works well but it is missing some functions I am use to having. The biggest being if I have the SHA256 of some data how to I get its CID.

For example if I know the SHA256 how do I get to bafkreicr2pggml4j5bjv3hhxi5i5ud4rgnnaqphrfs2mtoub77zfiwbhju

Sorry I forgot what the SHA256 for this example was but know this is a CID i have gotten in past from SHA256

CIDs are not plain hashes of the data, we first chunk and layout the data in a unixfs merkle tree.
This enable parallel incremental verification, streaming and seeking.

If you really want to use your sha256 hashes as-is there is a PoC of doing incremental verification here:

Note that sha256 is not a merkletree so the performance ceiling is worst (hopefully not that bad) and there would be no support for seeking, fetching random file offsets or streaming the file (all merkletree can do), actually this would stream the file but backward.
This would need a some work but could be done.

Your half correct. In my particular situation the data was already hashed and the old hashes needed to be used. Raw mode allows the sha256 to be used direct and the cid given is for data that was done in this way.

Raw node is limited to 2MiB. that why I linked Supporting Large IPLD Blocks which allows both to reuse your old hashes and support any size of blocks.

2MB is more than enough for json data. Either way I dont see the math to make the conversion from SHA256 to cid in your link anywhere.

Does this help you? It’s the specification (and explanation) of CIDv1: GitHub - multiformats/cid: Self-describing content-addressed identifiers for distributed systems

If needed, I can try to dive deeper with you.

A quick skim looks like this is what I am looking for. Will dive in to it in more depth now. Thanks.

1 Like

Thanks it turned out to be pretty simple. If anyone is curious

β€œb”+base32(0x01551220 + hash+0b000)

1 Like

This is only valid for single block raw sha256 files. Discord ( ipfs ) also explains how to get other hashes from the table.

For multiblock graphs there is no solution.

1 Like

thats all that is important when looking up data that was already hashed before IPFS was around.

1 Like