Migrating to CID V1 while keeping existing CID V0 pins alive?

Overview

I’m wondering if it’s possible to recursively pin new data with a V1 CID such that any users that previously existing pinned V0 CIDs would still contribute when the V1 CID was requested.

Background & Problem

I’m building a dataset that I’ve pinned to IPFS using CID V0.

The top level CID v0 of this dataset that contains several subfolders and each subfolder contains several image files. The v0 CIDs of the top level folder, a subfolder, and a file are:

  • QmfStoay5rjeHMEDiyuGsreXNHsyiS5kVaexSM2fov216j
  • QmbZrgM4jCJ8ccU9DLGewPkVBDH6pDVs4vdUUk1jeKyfic
  • QmaFcb565HM9FV8f41jrfCZcu1CXsZZMXEosjmbgeBhFQr - PXL_20210411_150641385.jpg

I have several users that have pinned this top level folder, which means the subfolders and subitems are also recursively pinned.

When I update the dataset I add a new subfolder containing new images, and then re-add the root folder and get a new root CID that I publish. This means that the CIDs of the previous subfolders don’t change, which means when a user pins the new root CID, they only have to grab the new data and not the entire 17+GB dataset.

Stuff I’ve tried

I would like to update my project to use CID version 1, but when this update happens I don’t want clients that pin the new root CID to (1) have to redownload everything and (2) have to store all of the old files twice.

To check if this was the case, I looked at the different CIDS for the file via:

  • ipfs add -n PXL_20210411_150641385.jpg --cid-version 0
  • ipfs add -n PXL_20210411_150641385.jpg --cid-version 1

These respectively returned:

  • QmaFcb565HM9FV8f41jrfCZcu1CXsZZMXEosjmbgeBhFQr
  • bafybeieuzuypuptalyfrpmmtihjmydtq2ok2fmkslsxpvdthxihmbomkay

I read that V0 CIDs could be convered to V1 CIDS via: ipfs cid base32 <CID>. But when I this on the V0 cid:

ipfs cid base32 Qmd4PzLWTZiawH1W3VzoAbkyh9hCopjqSVAddYF8PrYBfE

I get:

bafybeig2wvo6ojcq6fjemmqs3ussgcn3m2inbv6ogxrhfsbdkosaeilqpe

which is a V1 CID, but it does not match the V1 CID I got from the ipfs add command. This makes me think that the behavior I want wont be supported by default, but I was wondering if there was any way to support my use case.

To summarize, is there a way to pin a file with a V1 CID, such that it’s existing V0 CID converts to the new V1 cid? Or is there way to pin a V1 CID such that it can be converted back and forth between V0 and V1?

Is there anything else I can do to migrate to V1 without breaking existing users or should i just stick to V0?

when you enable cid V1 on ipfs add, it also enable raw leaves which is a simpler slightly smaller data format for storing the actual chunks (the root blocks are still the same tho).

So it’s also changing the final hash.
If you pass ipfs add --raw-leaves=false you should have the same CIDv1.

That did not seem to work. I got it to work with a simple file, but not the

Simple file

To verify this let’s work with the same file and run the script:

echo "foobar" > foobar
FPATH=foobar

sha256sum "$FPATH"
ipfs add -qn --cid-version 0 "$FPATH"
ipfs add -qn --cid-version 1 "$FPATH"
ipfs add -qn --cid-version 1 --raw-leaves=false "$FPATH"
CID_V0=$(ipfs add -qn --cid-version 0 "$FPATH")
ipfs cid base32 "$CID_V0"

Gives:

aec070645fe53ee3b3763059376134f058cc337247c978add178b6ccdfb0019f  foobar
QmRgutAxd8t7oGkSm4wmeuByG6M51wcTso6cubDdQtuEfL
bafkreifoybygix7fh3r3g5rqle3wcnhqldgdg4shzf4k3ulyw3gn7mabt4
bafybeibrypkxbagyiy5dyy5ssi67lioubll2opvolikk6wcccps7kbfmgm
bafybeibrypkxbagyiy5dyy5ssi67lioubll2opvolikk6wcccps7kbfmgm

The last two lines match, so that works


My file

So, now let’s try with my file:

FPATH=PXL_20210411_150641385.jpg

sha256sum "$FPATH"
ipfs add -qn --cid-version 0 "$FPATH"
ipfs add -qn --cid-version 1 "$FPATH"
ipfs add -qn --cid-version 1 --raw-leaves=false "$FPATH"
CID_V0=$(ipfs add -qn --cid-version 0 "$FPATH")
ipfs cid base32 "$CID_V0"

Gives:

6ba0583cdd93f410951417dc155afac2e98f8ce781e618ff7746866c02a7929d  PXL_20210411_150641385.jpg
QmaFcb565HM9FV8f41jrfCZcu1CXsZZMXEosjmbgeBhFQr
bafybeieuzuypuptalyfrpmmtihjmydtq2ok2fmkslsxpvdthxihmbomkay
bafybeie7j5jvs76z2kjrm3mi3n4cdsphyredz3xrlyq6tqgcgk3lfatfry
bafybeifrahymkr7pie2rh6o7yaky2keqc5apfnlzqhlq444af44fowchd4

And the last two lines no longer match, what’s going on?


Simple, but bigger file

So, let’s try with a bigger file. Let’s write 1MB of zeros to a file and use that.

dd if=/dev/zero of=lots_of_zeros count=1 bs=1M
FPATH=lots_of_zeros

sha256sum "$FPATH"
ipfs add -qn --cid-version 0 "$FPATH"
ipfs add -qn --cid-version 1 "$FPATH"
ipfs add -qn --cid-version 1 --raw-leaves=false "$FPATH"
CID_V0=$(ipfs add -qn --cid-version 0 "$FPATH")
ipfs cid base32 "$CID_V0"

This gives us:

30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58  lots_of_zeros
QmVkbauSDEaMP4Tkq6Epm9uW75mWm136n81YH8fGtfwdHU
bafybeiggzq4ryi7hscq5hzvzcnk4urnxt3asp37dhgvnjilf7exskximla
bafybeiephzsrbttogsiujiqziwjmhvu3pbggy256l76g2ootaediqz4mnq
bafybeidoeq3sicgbxby4b2h3ju2pd2ekivpgijs5vxv6vbozv6k3dd2ahm

which also does not match.

So, there seems to be something else going on besides --raw-leaves=false, probably having to do with the file or chunk size?

Further exploration.

I did a bit more exploration of this, and found that this method will work for files up to 256k, but if you make the files 257k, it will fail:

# Change 257k to 256k and this will work
dd if=/dev/zero of=lots_of_zeros count=1 bs=257k
FPATH=lots_of_zeros

sha256sum "$FPATH"
CID_V0_DEFAULT=$(ipfs add -qn --cid-version 0 "$FPATH")
CID_V1_DEFAULT=$(ipfs add -qn --cid-version 1 "$FPATH")
CID_V0_RLT=$(ipfs add -qn --cid-version 0 --raw-leaves=true "$FPATH")
CID_V0_RLF=$(ipfs add -qn --cid-version 0 --raw-leaves=false "$FPATH")
CID_V1_RLT=$(ipfs add -qn --cid-version 1 --raw-leaves=true "$FPATH")

CID_V1_RLF=$(ipfs add -qn --cid-version 1 --raw-leaves=false "$FPATH")
CID_V1_FROM_V0_RLT=$(ipfs cid base32 "$CID_V0_RLT")
CID_V1_FROM_V0_RLF=$(ipfs cid base32 "$CID_V0_RLF")


echo "--raw-leaves=true results"
echo "---"
echo "CID_V0_RLT         = $CID_V0_RLT"
echo "---"
echo "CID_V1_DEFAULT     = $CID_V1_DEFAULT"
echo "CID_V1_RLT         = $CID_V1_RLT"
echo "CID_V1_FROM_V0_RLT = $CID_V1_FROM_V0_RLT"
echo "---"
echo ""
echo ""
echo "--raw-leaves=false results"
echo "---"
echo "CID_V0_DEFAULT     = $CID_V0_DEFAULT"
echo "CID_V0_RLF         = $CID_V0_RLF"
echo "---"
echo "CID_V1_RLF         = $CID_V1_RLF"
echo "CID_V1_FROM_V0_RLF = $CID_V1_FROM_V0_RLF"

Is there another option besides --raw-leaves that changes when you change CID versions?

The “leaves” of the file will have the same multihash with raw-leaves=false, but they will be addressed using CIDv1s. That means that any non-leaf node in the DAG will be storing CIDv1s for links, which means that the final CIDs for non-leaf nodes will be different.

However, these should be mostly lightweight nodes. The bigger nodes will be the leaves, and those will be already “cached” by clients since they had the CIDv0 version (as long as they are on go-ipfs >=0.12). That will save time in the replication of those DAGs.