Here is me decoding your CID produced by Kubo:
$ ipfs block get bafybeibawqusphaqfn7c7b5hsr4wgmxud7qfhnmd2ntco7oqjnvmg5njfa | protoc --decode=PBNode unixfs.proto
Data {
Type: File
filesize: 79381883 // <- this is the sum of the blocksizes + inlined data, idk why this exists you could just do an addition
blocksizes: 45613056 // <- this is the underlying size of the first link
blocksizes: 33768827 // <- this is the underlying size of the second link
}
Links {
Hash: "\001p\022 \274\342\256\243\032\312\005\260\024\251\267\303#\006\376f\376z\200\353\276\213\316\305\200\307\375w\370\223\340\310"
Name: ""
Tsize: 45621766
}
Links {
Hash: "\001p\022 \364F9\362\326\357\007\\\205\256\000\025thM\013\207\001\036\335.\201\007\243!^\242i\3543\232\221"
Name: ""
Tsize: 33775287
}
Here is the same thing but for the go-car/cmd/car
CID:
$ ipfs block get bafybeiecchz2dlxyhgdz4v7w3bujemvogkqq6k7wt5rdvihe2fuavfbi4q | protoc --decode=PBNode unixfs.proto
Data {
Type: File
filesize: 79381883
blocksizes: 45613056
blocksizes: 33768827
}
Links {
Hash: "\001p\022 \274\342\256\243\032\312\005\260\024\251\267\303#\006\376f\376z\200\353\276\213\316\305\200\307\375w\370\223\340\310"
Name: ""
Tsize: 45613056
}
Links {
Hash: "\001p\022 \364F9\362\326\357\007\\\205\256\000\025thM\013\207\001\036\335.\201\007\243!^\242i\3543\232\221"
Name: ""
Tsize: 33768827
}
Sadly no intresting difference.
So we have to go deeper, using protoscope
will show us the raw binary representation and not something being ran through a schema:
Here is the Kubo CID through protoscope
:
$ ipfs block get bafybeibawqusphaqfn7c7b5hsr4wgmxud7qfhnmd2ntco7oqjnvmg5njfa | protoscope
2: {
1: {`01701220bce2aea31aca05b014a9b7c32306fe66fe7a80ebbe8bcec580c7fd77f893e0c8`}
2: {}
3: 45621766
}
2: {
1: {`01701220f44639f2d6ef075c85ae001574684d0b87011edd2e8107a3215ea269ec339a91`}
2: {}
3: 33775287
}
1: {
1: 2
3: 79381883
4: 45613056
4: 33768827
}
Here is the CID produced by go-car:
$ ipfs block get bafybeiecchz2dlxyhgdz4v7w3bujemvogkqq6k7wt5rdvihe2fuavfbi4q | protoscope
2: {
1: {`01701220bce2aea31aca05b014a9b7c32306fe66fe7a80ebbe8bcec580c7fd77f893e0c8`}
2: {}
3: 45613056
}
2: {
1: {`01701220f44639f2d6ef075c85ae001574684d0b87011edd2e8107a3215ea269ec339a91`}
2: {}
3: 33768827
}
1: {
1: 2
3: 79381883
4: 45613056
4: 33768827
}
And this is the part where I am wrong. So I was thinking it would be a Protobuf encoding issue. Protobuf is not a repeatable format, protobuf in python, Go and java can all produce different encoding that are equivalent (the order of fields can be randomised for example).
But this is actually an issue with the TSize computation within go-car, if I do ipfs block get f01701220bce2aea31aca05b014a9b7c32306fe66fe7a80ebbe8bcec580c7fd77f893e0c8 | protoscope
(I run protoscope on the first children of go-car’s CID) I see it has 174
262144
bytes blocks.
Which is intresting because 174 * 262144 = 45613056
, exaclty the TSize reported, it should be 174 * 262144 + 8710 = 45621766
(8710
is the size of the dag-pb children block) which is the size recorded by Kubo.
So this is a go-unixfsnode
node, when computing tsize I guess it do something like sum(node.links[].tsize)
where the correct thing (and the thing Kubo do is) sum(node.links[].tsize) + len(protobufEncode(node))
The thing I was trying to get at (but failed give your issue is actually a bug in go-unixfsnode
) is Unixfs is not a repeatable hash function, you can use it to verify some IPLD tree and it will always give you back the original data (assuming your underlying function here sha256 is secure). But running the same add operation multiple times might result in different CIDs, the size of the chunks can be different --raw-leaves
can be enabled or not, the chunking algorithm used can be optimised for video data instead of text files, …