A question about CID hash

Note: If CID bafkreia2xtwwdys4dxonlzjod5yxdz7tkiut5l2sgrdrh4d52d3qpstrpy is CIDv2, then please ignore this question.


Greetings!

I am reading an article about IPFS and I have downloaded this file bafkreia2xtwwdys4dxonlzjod5yxdz7tkiut5l2sgrdrh4d52d3qpstrpy.

Because this CID refers to a single file, I have expected it to represent the hash of the file itself, yet the CID hash and the checksum of the file differ, provided that this is CIDv1.

$ ipfs get bafkreia2xtwwdys4dxonlzjod5yxdz7tkiut5l2sgrdrh4d52d3qpstrpy
Saving file(s) to bafkreia2xtwwdys4dxonlzjod5yxdz7tkiut5l2sgrdrh4d52d3qpstrpy
 155.88 KiB / 155.88 KiB [=============================================================] 100.00% 0s
$ sha1sum bafkreia2xtwwdys4dxonlzjod5yxdz7tkiut5l2sgrdrh4d52d3qpstrpy
4effb299ca044c1efa1279038b33454dd91a8024  bafkreia2xtwwdys4dxonlzjod5yxdz7tkiut5l2sgrdrh4d52d3qpstrpy
$ sha256sum bafkreia2xtwwdys4dxonlzjod5yxdz7tkiut5l2sgrdrh4d52d3qpstrpy
1abced61e25c1ddcd5e52e1f7171e7f352293eaf52344713f07dd0f707ca717e  bafkreia2xtwwdys4dxonlzjod5yxdz7tkiut5l2sgrdrh4d52d3qpstrpy

If this CID is CIDv1, then why are the hashes differ?

References:

Hey there,

  • bafkreia2xtwwdys4dxonlzjod5yxdz7tkiut5l2sgrdrh4d52d3qpstrpy is a CIDv1. There’s no such thing as CIDv2. When in doubt, use the CID Inspector
  • The CID uses name: sha2-256 as the hashing algorithm. sha256sum is the right command to calculate with this hash algorithm.
  • The CID is encoded as a string using base32, where as the output of sha256sum is in HEX (base16).
  • The hashes do in fact match, if you look in the CID inspector, you will see that the output from sha256sum matches the digest (hex) in the CID Inspector

Daniel, could you please write commands that can be executed in a bash console?

I want to analyse CID, and further understand it, in order to be able to ask for improvements and ideas for IPFS.

I don’t think this is a correct assumption. If the file is larger than the maximum block size it will be the root of a tree with multiple blocks. I believe the only case where it would be as you assumed is if the file is smaller than the maximum block size and was added with “raw leaves” and even then I’m not sure it would be the case. I think you’d still have a root node referring to the single raw leaf. In this case you’d easily be able to retrieve the CID of the single node and get what you’re looking for.

I suggest taking a look at https://dag.ipfs.tech/ as another tool too see how the Merkle DAG is constructed.