How does the same photo have different CIDv1?

These are two pictures that are exactly the same.

https://bafybeieb3fsnggzs35rhxqsfoqaohtykbwxt4tesbrmwm5cstsoc6vmyxu.ipfs.dweb.link/


https://bafkreigr6gb656kzepd6a4mrmgdo4wq6vvpcx7cj6xcumc43rxl5ldf3oe.ipfs.dweb.link/


Why do they have a different hash?

If you visit cid.ipfs.io and enter these two CIDs you will see that, the multicodec used for first CID is ‘dag-pb’ and multicodec used for second CID is ‘raw’.

If you look at the initial part of CID.
1st CID begins with bafy (notice the y, which means dag-pb)
2nd CID begins with bafk (notice the k, which means raw)

Can you please provide more context on how you got these, and steps to reproduce this, so that we can help understand what to having these 2 CIDs.

1 Like

Thank you very much for your reply.
Indeed, the multicodecs are different.
I uploaded one file on fleek and another on pinata.
And on fleek the file cannot be opened for hours, while on pinata this CID opens instantly.
I’m wondering how the same file uploaded to the network has different CIDs and opens differently.

Perhaps the gateway (ipfs.io, dweb.link, etc.) should convert to different multicodecs when receiving CIDs and search in parallel.

This explains why you got different CIDs, when uploading a file to IPFS (or more factually adding the file to an IPFS node). There are various parameters while adding the file. So of them are chunk size, type of leaves, type of Tree, number of maximum children. These factors affect the generated CID. In your case, it seems like the one you received from Fleek (which I suppose is of multicodec ‘raw’) might have used “Raw Leaves”, and one from Pinata might have used “UnixFS Leaves”.

This occurs due to differences in performance of pinning service providers and is not related to the multicodec of a CID.

It is not just possible, because a change in multicodec also means change in the underlying data, which ultimately will result in different hash. When hashes are different then those two pieces of data are not the same. When using UnixFS Leaves, we will have some more information, which might not be present in raw leaves. It means that the underlying data has changed and thus the CID. So, just changing the multicodec part of the CID won’t work, because it would result into invalid CID.

But, I would assure you that, if 2 CIDs are indeed pointing to same thing, then IPFS would find them appropriately.

To test it
Here’s your CID
bafybeieb3fsnggzs35rhxqsfoqaohtykbwxt4tesbrmwm5cstsoc6vmyxu

Here’s the CID version 0 of the same CID
QmX5XZps24BizVepAGSVbS38ytHXtLmAiSZ2iZbSvcQ5cQ

When you input both of the CIDs in cid.ipfs.io, you can see that “Digest” are same. So regardless of which version you are using, even though the CIDs look different and have different version IPFS can find the data, because they have the same content.

Extra:
While I tried to convert the second CID to CID version 0, I wasn’t able to do so, because CIDv0 only supports ‘dag-pb’ nodes.
So, now we have more insights on why was the CID different, seems like Fleek uses CIDv0 and Pinata has moved to using CIDv1.

If you have any further questions feel free to ask.

1 Like

Very interesting, it turns out that converting CIDv1 to CIDv0 is not always possible, while CIDv0 can be converted to all formats. Why is it decided that CIDv1 is better than CIDv0 and there is a widespread transition to v1?
CIDv0 can be saved in solidity bytes32 slot, and now it turns out that all CIDv1 files I get from fleek, I can not effectively save in solidity, and will have to save in expensive string/bytes slots.

CIDv1 doesn’t have to be long. For example, I recoded the same CID again, but in a short format:

zdj7WeAmwFvsXfKsCxju1Q7Toshc6UybtBJakxZPMz7k3xynL

As you can see, it’s about the same size as a CIDv0.

And to answer your other question, people are migrating to CIDv1 because they contain more options, while CIDv0 is only one specific encoding.

2 Likes