Migrating to CID V1 while keeping existing CID V0 pins alive?

Erotemic · April 16, 2022, 6:07pm

Overview

I’m wondering if it’s possible to recursively pin new data with a V1 CID such that any users that previously existing pinned V0 CIDs would still contribute when the V1 CID was requested.

Background & Problem

I’m building a dataset that I’ve pinned to IPFS using CID V0.

The top level CID v0 of this dataset that contains several subfolders and each subfolder contains several image files. The v0 CIDs of the top level folder, a subfolder, and a file are:

QmfStoay5rjeHMEDiyuGsreXNHsyiS5kVaexSM2fov216j
QmbZrgM4jCJ8ccU9DLGewPkVBDH6pDVs4vdUUk1jeKyfic
QmaFcb565HM9FV8f41jrfCZcu1CXsZZMXEosjmbgeBhFQr - PXL_20210411_150641385.jpg

I have several users that have pinned this top level folder, which means the subfolders and subitems are also recursively pinned.

When I update the dataset I add a new subfolder containing new images, and then re-add the root folder and get a new root CID that I publish. This means that the CIDs of the previous subfolders don’t change, which means when a user pins the new root CID, they only have to grab the new data and not the entire 17+GB dataset.

Stuff I’ve tried

I would like to update my project to use CID version 1, but when this update happens I don’t want clients that pin the new root CID to (1) have to redownload everything and (2) have to store all of the old files twice.

To check if this was the case, I looked at the different CIDS for the file via:

ipfs add -n PXL_20210411_150641385.jpg --cid-version 0
ipfs add -n PXL_20210411_150641385.jpg --cid-version 1

These respectively returned:

QmaFcb565HM9FV8f41jrfCZcu1CXsZZMXEosjmbgeBhFQr
bafybeieuzuypuptalyfrpmmtihjmydtq2ok2fmkslsxpvdthxihmbomkay

I read that V0 CIDs could be convered to V1 CIDS via: ipfs cid base32 <CID>. But when I this on the V0 cid:

ipfs cid base32 Qmd4PzLWTZiawH1W3VzoAbkyh9hCopjqSVAddYF8PrYBfE

I get:

bafybeig2wvo6ojcq6fjemmqs3ussgcn3m2inbv6ogxrhfsbdkosaeilqpe

which is a V1 CID, but it does not match the V1 CID I got from the ipfs add command. This makes me think that the behavior I want wont be supported by default, but I was wondering if there was any way to support my use case.

To summarize, is there a way to pin a file with a V1 CID, such that it’s existing V0 CID converts to the new V1 cid? Or is there way to pin a V1 CID such that it can be converted back and forth between V0 and V1?

Is there anything else I can do to migrate to V1 without breaking existing users or should i just stick to V0?

Jorropo · April 16, 2022, 8:19pm

when you enable cid V1 on ipfs add, it also enable raw leaves which is a simpler slightly smaller data format for storing the actual chunks (the root blocks are still the same tho).

So it’s also changing the final hash.
If you pass ipfs add --raw-leaves=false you should have the same CIDv1.

Erotemic · April 20, 2022, 12:33am

That did not seem to work. I got it to work with a simple file, but not the

Simple file

To verify this let’s work with the same file and run the script:

echo "foobar" > foobar
FPATH=foobar

sha256sum "$FPATH"
ipfs add -qn --cid-version 0 "$FPATH"
ipfs add -qn --cid-version 1 "$FPATH"
ipfs add -qn --cid-version 1 --raw-leaves=false "$FPATH"
CID_V0=$(ipfs add -qn --cid-version 0 "$FPATH")
ipfs cid base32 "$CID_V0"

Gives:

aec070645fe53ee3b3763059376134f058cc337247c978add178b6ccdfb0019f  foobar
QmRgutAxd8t7oGkSm4wmeuByG6M51wcTso6cubDdQtuEfL
bafkreifoybygix7fh3r3g5rqle3wcnhqldgdg4shzf4k3ulyw3gn7mabt4
bafybeibrypkxbagyiy5dyy5ssi67lioubll2opvolikk6wcccps7kbfmgm
bafybeibrypkxbagyiy5dyy5ssi67lioubll2opvolikk6wcccps7kbfmgm

The last two lines match, so that works

My file

So, now let’s try with my file:

FPATH=PXL_20210411_150641385.jpg

sha256sum "$FPATH"
ipfs add -qn --cid-version 0 "$FPATH"
ipfs add -qn --cid-version 1 "$FPATH"
ipfs add -qn --cid-version 1 --raw-leaves=false "$FPATH"
CID_V0=$(ipfs add -qn --cid-version 0 "$FPATH")
ipfs cid base32 "$CID_V0"

Gives:

6ba0583cdd93f410951417dc155afac2e98f8ce781e618ff7746866c02a7929d  PXL_20210411_150641385.jpg
QmaFcb565HM9FV8f41jrfCZcu1CXsZZMXEosjmbgeBhFQr
bafybeieuzuypuptalyfrpmmtihjmydtq2ok2fmkslsxpvdthxihmbomkay
bafybeie7j5jvs76z2kjrm3mi3n4cdsphyredz3xrlyq6tqgcgk3lfatfry
bafybeifrahymkr7pie2rh6o7yaky2keqc5apfnlzqhlq444af44fowchd4

And the last two lines no longer match, what’s going on?

Simple, but bigger file

So, let’s try with a bigger file. Let’s write 1MB of zeros to a file and use that.

dd if=/dev/zero of=lots_of_zeros count=1 bs=1M
FPATH=lots_of_zeros

sha256sum "$FPATH"
ipfs add -qn --cid-version 0 "$FPATH"
ipfs add -qn --cid-version 1 "$FPATH"
ipfs add -qn --cid-version 1 --raw-leaves=false "$FPATH"
CID_V0=$(ipfs add -qn --cid-version 0 "$FPATH")
ipfs cid base32 "$CID_V0"

This gives us:

30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58  lots_of_zeros
QmVkbauSDEaMP4Tkq6Epm9uW75mWm136n81YH8fGtfwdHU
bafybeiggzq4ryi7hscq5hzvzcnk4urnxt3asp37dhgvnjilf7exskximla
bafybeiephzsrbttogsiujiqziwjmhvu3pbggy256l76g2ootaediqz4mnq
bafybeidoeq3sicgbxby4b2h3ju2pd2ekivpgijs5vxv6vbozv6k3dd2ahm

which also does not match.

So, there seems to be something else going on besides --raw-leaves=false, probably having to do with the file or chunk size?

Further exploration.

I did a bit more exploration of this, and found that this method will work for files up to 256k, but if you make the files 257k, it will fail:

# Change 257k to 256k and this will work
dd if=/dev/zero of=lots_of_zeros count=1 bs=257k
FPATH=lots_of_zeros

sha256sum "$FPATH"
CID_V0_DEFAULT=$(ipfs add -qn --cid-version 0 "$FPATH")
CID_V1_DEFAULT=$(ipfs add -qn --cid-version 1 "$FPATH")
CID_V0_RLT=$(ipfs add -qn --cid-version 0 --raw-leaves=true "$FPATH")
CID_V0_RLF=$(ipfs add -qn --cid-version 0 --raw-leaves=false "$FPATH")
CID_V1_RLT=$(ipfs add -qn --cid-version 1 --raw-leaves=true "$FPATH")

CID_V1_RLF=$(ipfs add -qn --cid-version 1 --raw-leaves=false "$FPATH")
CID_V1_FROM_V0_RLT=$(ipfs cid base32 "$CID_V0_RLT")
CID_V1_FROM_V0_RLF=$(ipfs cid base32 "$CID_V0_RLF")


echo "--raw-leaves=true results"
echo "---"
echo "CID_V0_RLT         = $CID_V0_RLT"
echo "---"
echo "CID_V1_DEFAULT     = $CID_V1_DEFAULT"
echo "CID_V1_RLT         = $CID_V1_RLT"
echo "CID_V1_FROM_V0_RLT = $CID_V1_FROM_V0_RLT"
echo "---"
echo ""
echo ""
echo "--raw-leaves=false results"
echo "---"
echo "CID_V0_DEFAULT     = $CID_V0_DEFAULT"
echo "CID_V0_RLF         = $CID_V0_RLF"
echo "---"
echo "CID_V1_RLF         = $CID_V1_RLF"
echo "CID_V1_FROM_V0_RLF = $CID_V1_FROM_V0_RLF"

Is there another option besides --raw-leaves that changes when you change CID versions?

hector · April 20, 2022, 9:35am

The “leaves” of the file will have the same multihash with raw-leaves=false, but they will be addressed using CIDv1s. That means that any non-leaf node in the DAG will be storing CIDv1s for links, which means that the final CIDs for non-leaf nodes will be different.

However, these should be mostly lightweight nodes. The bigger nodes will be the leaves, and those will be already “cached” by clients since they had the CIDv0 version (as long as they are on go-ipfs >=0.12). That will save time in the replication of those DAGs.

Topic		Replies	Views
Pinning data without duplication? Help	7	97	August 21, 2024
Copied IPFS data directory onto new host but it's showing no pins Kubo go-ipfs	21	1249	September 13, 2020
Can you convert CIDV0 into a CIDV1 BASE32? Help	5	2105	August 20, 2020
Why is a CIDv0 converted to CIDv1 not the same as a file added with `--cid-version=1`? Help go-ipfs , multihash	15	990	May 14, 2022
Will directory objects survive the CIDv0->CIDv1 transition? Ecosystem and Usage files	2	422	January 23, 2022