The cid calculated by go-car is different from the cid calculated by ipfs because of the different Tsize

ss75710541 · June 15, 2023, 3:59pm

same file go-car cid is bafybeiecchz2dlxyhgdz4v7w3bujemvogkqq6k7wt5rdvihe2fuavfbi4q

--- bafybeiecchz2dlxyhgdz4v7w3bujemvogkqq6k7wt5rdvihe2fuavfbi4q
+++ dag-json bafybeiecchz2dlxyhgdz4v7w3bujemvogkqq6k7wt5rdvihe2fuavfbi4q
@@ -0,23 +0,23 @@
{
	"Data": {
		"/": {
			"bytes": "CAIY+4rtJSCAgOAVIPuKjRA"
		}
	},
	"Links": [
		{
			"Hash": {
				"/": "bafybeif44kxkggwkawybjknxymrqn7tg7z5ib256rphmlagh7v37re7aza"
			},
			"Name": "",
			"Tsize": 45613056
		},
		{
			"Hash": {
				"/": "bafybeihuiy47fvxpa5oillqacv2gqtilq4ar5xjoqed2gik6uju6ym42se"
			},
			"Name": "",
			"Tsize": 33768827
		}
	]
}

# ipfs dag get bafybeibawqusphaqfn7c7b5hsr4wgmxud7qfhnmd2ntco7oqjnvmg5njfa|python -m json.tool

{
    "Data": {
        "/": {
            "bytes": "CAIY+4rtJSCAgOAVIPuKjRA"
        }
    },
    "Links": [
        {
            "Hash": {
                "/": "bafybeif44kxkggwkawybjknxymrqn7tg7z5ib256rphmlagh7v37re7aza"
            },
            "Name": "",
            "Tsize": 45621766
        },
        {
            "Hash": {
                "/": "bafybeihuiy47fvxpa5oillqacv2gqtilq4ar5xjoqed2gik6uju6ym42se"
            },
            "Name": "",
            "Tsize": 33775287
        }
    ]
}

Jorropo · June 15, 2023, 5:12pm

Here is me decoding your CID produced by Kubo:

$ ipfs block get bafybeibawqusphaqfn7c7b5hsr4wgmxud7qfhnmd2ntco7oqjnvmg5njfa | protoc --decode=PBNode unixfs.proto
Data {
  Type: File
  filesize: 79381883 // <- this is the sum of the blocksizes + inlined data, idk why this exists you could just do an addition
  blocksizes: 45613056 // <- this is the underlying size of the first link
  blocksizes: 33768827 // <- this is the underlying size of the second link
}
Links {
  Hash: "\001p\022 \274\342\256\243\032\312\005\260\024\251\267\303#\006\376f\376z\200\353\276\213\316\305\200\307\375w\370\223\340\310"
  Name: ""
  Tsize: 45621766
}
Links {
  Hash: "\001p\022 \364F9\362\326\357\007\\\205\256\000\025thM\013\207\001\036\335.\201\007\243!^\242i\3543\232\221"
  Name: ""
  Tsize: 33775287
}

Here is the same thing but for the go-car/cmd/car CID:

$ ipfs block get bafybeiecchz2dlxyhgdz4v7w3bujemvogkqq6k7wt5rdvihe2fuavfbi4q | protoc --decode=PBNode unixfs.proto
Data {
  Type: File
  filesize: 79381883
  blocksizes: 45613056
  blocksizes: 33768827
}
Links {
  Hash: "\001p\022 \274\342\256\243\032\312\005\260\024\251\267\303#\006\376f\376z\200\353\276\213\316\305\200\307\375w\370\223\340\310"
  Name: ""
  Tsize: 45613056
}
Links {
  Hash: "\001p\022 \364F9\362\326\357\007\\\205\256\000\025thM\013\207\001\036\335.\201\007\243!^\242i\3543\232\221"
  Name: ""
  Tsize: 33768827
}

Sadly no intresting difference.

So we have to go deeper, using protoscope will show us the raw binary representation and not something being ran through a schema:
Here is the Kubo CID through protoscope:

$ ipfs block get bafybeibawqusphaqfn7c7b5hsr4wgmxud7qfhnmd2ntco7oqjnvmg5njfa | protoscope
2: {
  1: {`01701220bce2aea31aca05b014a9b7c32306fe66fe7a80ebbe8bcec580c7fd77f893e0c8`}
  2: {}
  3: 45621766
}
2: {
  1: {`01701220f44639f2d6ef075c85ae001574684d0b87011edd2e8107a3215ea269ec339a91`}
  2: {}
  3: 33775287
}
1: {
  1: 2
  3: 79381883
  4: 45613056
  4: 33768827
}

Here is the CID produced by go-car:

$ ipfs block get bafybeiecchz2dlxyhgdz4v7w3bujemvogkqq6k7wt5rdvihe2fuavfbi4q | protoscope
2: {
  1: {`01701220bce2aea31aca05b014a9b7c32306fe66fe7a80ebbe8bcec580c7fd77f893e0c8`}
  2: {}
  3: 45613056
}
2: {
  1: {`01701220f44639f2d6ef075c85ae001574684d0b87011edd2e8107a3215ea269ec339a91`}
  2: {}
  3: 33768827
}
1: {
  1: 2
  3: 79381883
  4: 45613056
  4: 33768827
}

And this is the part where I am wrong. So I was thinking it would be a Protobuf encoding issue. Protobuf is not a repeatable format, protobuf in python, Go and java can all produce different encoding that are equivalent (the order of fields can be randomised for example).
But this is actually an issue with the TSize computation within go-car, if I do ipfs block get f01701220bce2aea31aca05b014a9b7c32306fe66fe7a80ebbe8bcec580c7fd77f893e0c8 | protoscope (I run protoscope on the first children of go-car’s CID) I see it has 174 262144bytes blocks.
Which is intresting because 174 * 262144 = 45613056, exaclty the TSize reported, it should be 174 * 262144 + 8710 = 45621766 (8710 is the size of the dag-pb children block) which is the size recorded by Kubo.
So this is a go-unixfsnode node, when computing tsize I guess it do something like sum(node.links[].tsize) where the correct thing (and the thing Kubo do is) sum(node.links[].tsize) + len(protobufEncode(node))

The thing I was trying to get at (but failed give your issue is actually a bug in go-unixfsnode) is Unixfs is not a repeatable hash function, you can use it to verify some IPLD tree and it will always give you back the original data (assuming your underlying function here sha256 is secure). But running the same add operation multiple times might result in different CIDs, the size of the chunks can be different --raw-leaves can be enabled or not, the chunking algorithm used can be optimised for video data instead of text files, …

ss75710541 · June 16, 2023, 7:28am

This is my test code

package main

import (
	"bytes"
	"fmt"
	"github.com/ipfs/boxo/ipld/merkledag"
	unixfs_pb "github.com/ipfs/boxo/ipld/unixfs/pb"
	"github.com/ipfs/go-cid"
	shell "github.com/ipfs/go-ipfs-api"
	"github.com/ipfs/go-ipfs-api/options"
	format "github.com/ipfs/go-ipld-format"
	dagpb "github.com/ipld/go-codec-dagpb"
)

func cidTest() {

	sh := shell.NewShell("127.0.0.1:5001")
	cid1Str := "bafybeif44kxkggwkawybjknxymrqn7tg7z5ib256rphmlagh7v37re7aza"
	cid2Str := "bafybeihuiy47fvxpa5oillqacv2gqtilq4ar5xjoqed2gik6uju6ym42se"
	cid1, _ := cid.Decode("bafybeif44kxkggwkawybjknxymrqn7tg7z5ib256rphmlagh7v37re7aza")
	cid2, _ := cid.Decode("bafybeihuiy47fvxpa5oillqacv2gqtilq4ar5xjoqed2gik6uju6ym42se")

	cid1Stat, _ := sh.ObjectStat(cid1Str)
	cid2Stat, _ := sh.ObjectStat(cid2Str)
	// ipfs dag stat {CID}  desplay this size
	size1 := uint64(cid1Stat.CumulativeSize)
	size2 := uint64(cid2Stat.CumulativeSize)

	//  TotalSize used size
	srcSize1 := uint64(cid1Stat.CumulativeSize - cid1Stat.BlockSize)
	srcSize2 := uint64(cid2Stat.CumulativeSize - cid2Stat.BlockSize)

	// 原始size 总和
	totalsize := uint64(srcSize1 + srcSize2)
	protoNode := new(merkledag.ProtoNode)
	pbFile := new(unixfs_pb.Data)
	pbType := unixfs_pb.Data_File
	pbFile.Type = &pbType

	protoNode.AddRawLink("", &format.Link{Cid: cid1, Size: size1})
	protoNode.AddRawLink("", &format.Link{Cid: cid2, Size: size2})

	pbFile.Filesize = &totalsize

	pbFile.Blocksizes = []uint64{srcSize1, srcSize2}

	data, err := pbFile.XXX_Marshal(nil, false)
	if err != nil {
		panic(err)
	}

	fmt.Printf("pbFile string: %s\n", pbFile.String())

	protoNode.SetData(data)

	dagBuf := bytes.NewBuffer(nil)

	dagpb.Encode(protoNode, dagBuf)

	protoNode.SetCidBuilder(merkledag.V1CidPrefix())

	fmt.Printf("cid: %s\n", protoNode.Cid().String())
	cid, err := sh.DagPutWithOpts(dagBuf, options.Dag.InputCodec("dag-pb"), options.Dag.StoreCodec("dag-pb"), options.Dag.Pin("false"))

	if err != nil {
		panic(err)
	}
	fmt.Println(cid)

}

func main() {
	cidTest()
}

The results show （The results are the same as those calculated by ipfs）

pbFile string: Type:File filesize:79381883 blocksizes:45613056 blocksizes:33768827 
cid: bafybeibawqusphaqfn7c7b5hsr4wgmxud7qfhnmd2ntco7oqjnvmg5njfa
bafybeibawqusphaqfn7c7b5hsr4wgmxud7qfhnmd2ntco7oqjnvmg5njfa

change code this place

	//size1 := uint64(cid1Stat.CumulativeSize)
	//size2 := uint64(cid2Stat.CumulativeSize)

	size1 := uint64(cid1Stat.CumulativeSize - cid1Stat.BlockSize)
	size2 := uint64(cid2Stat.CumulativeSize - cid2Stat.BlockSize)

The result show ( The result is the same as that calculated by go-car )

pbFile string: Type:File filesize:79381883 blocksizes:45613056 blocksizes:33768827 
cid: bafybeiecchz2dlxyhgdz4v7w3bujemvogkqq6k7wt5rdvihe2fuavfbi4q
bafybeiecchz2dlxyhgdz4v7w3bujemvogkqq6k7wt5rdvihe2fuavfbi4q

So I think different Tsize affects the cid result

Topic		Replies	Views
Should we profile CIDs? Protocol	48	879	July 18, 2025
Why has same sub cids but not same root cid? Help js-ipfs , ipld , go-ipfs , files	5	708	August 7, 2020
Same File Produced Different CID go-ipfs , files	5	553	March 28, 2022
Why does the same file result in different sha256 in cid? Help go-ipfs	4	537	June 14, 2022
How to compute the ipld scheme CID in golang or others? Help go-ipfs	1	34	August 23, 2024

The cid calculated by go-car is different from the cid calculated by ipfs because of the different Tsize

Related topics