Hi,
When reading the same CID using ipfs.cat
with and without optional offset
and length
parameters I am getting different file sizes. Am I doing something wrong or this is a bug?
Sample code used is following.
const cid = CID.parse('bafybeifcd5ysgcmgsn3l3nebo3wu7jsuf7xumnsfup55dk2ikis3ujsnue')
let bufferBytes = 0
// Using ipfs.cat without offset and length parameters
for await (const buf of ipfs.cat(cid, {})) {
bufferBytes += buf.byteLength
}
console.log(`Bytes received when using ipfs.cat WITHOUT offset and length parameters: ${bufferBytes}`)
// Using ipfs.cat with offset and length parameters
bufferBytes = 0
let offset = 0
const length = 1024*1024
let sliceBytes = 1
while(sliceBytes > 0) {
sliceBytes = await readSlice(cid, offset, length)
bufferBytes += sliceBytes
offset = bufferBytes
}
console.log(`Bytes received when using ipfs.cat WITH offset and length parameters: ${bufferBytes}`)
async function readSlice(cid, offset, length) {
let sliceLen = 0
for await (const buf of ipfs.cat(cid, {offset: offset, length: length})) {
const bufLen = buf.byteLength
sliceLen += bufLen
}
console.log(`Offset: ${offset}, Slice: ${sliceLen}`)
return sliceLen
}
//const stat = await ipfs.object.stat(cid)
//console.dir(stat, { depth: null })
The output of above code running is following
Bytes received when using ipfs.cat WITHOUT offset and length parameters: 7568164
Offset: 0, Slice: 1048576
Offset: 1048576, Slice: 1048576
Offset: 2097152, Slice: 1048576
Offset: 3145728, Slice: 1048576
Offset: 4194304, Slice: 1048576
Offset: 5242880, Slice: 1048576
Offset: 6291456, Slice: 1048576
Offset: 7340032, Slice: 229380
Offset: 7569412, Slice: 208
Offset: 7569620, Slice: 0
Bytes received when using ipfs.cat WITH offset and length parameters: 7569620
CID bafybeifcd5ysgcmgsn3l3nebo3wu7jsuf7xumnsfup55dk2ikis3ujsnue
is an ZIP archive. When it is retrieved using offset
and length
parameters it is corrupted.
It must be something obvious but I can’t find what is the issue here.
Thanks for help!
1 Like
I’m trying to replicate this with Helia and having problems, though I can’t re-import the file behind bafybeifcd5ysgcmgsn3l3nebo3wu7jsuf7xumnsfup55dk2ikis3ujsnue
and get the same CID - what import settings did you use?
Hey achingbrain thank you for your answer!
The bafybeifcd5ysgcmgsn3l3nebo3wu7jsuf7xumnsfup55dk2ikis3ujsnue
is an experiment made linking intentionally larger chunks (1m instead of 256k). It would be very beneficial for me to understand better how is ipfs.cat
/ exporter
without provided offset
and length
able to read it correctly whilst if those parameters were provided it is not.
Thanks for your help!
If you add the file to Kubo with 1M chunks do you get the same CID?
E.g.
$ ipfs add --chunker=size-1048576 --cid-version=1 --raw-leaves ./foo.zip
I ask because looking at the DAG in IPLD Explorer shows that the last node linked to from the root is a raw node. It should be a dag-pb node with a raw node child like the others.
It also seems to have chunk sizes of 1048784
which when trying to replicate as an import setting causes Kubo to error out with Error: chunker parameters may not exceed the maximum chunk size of 1048576
If doing a depth-first traversal this datastructure should reasonably be handled, but I notice that downloading bafybeifcd5ysgcmgsn3l3nebo3wu7jsuf7xumnsfup55dk2ikis3ujsnue
from the gateway gives me a corrupted zip file so I’m not sure Kubo handles this properly either.
1 Like
Indeed chunk sizes of 1048784
are the problem here. Amazingly ipfs.cat
without provided optional parameters can read it well still?!
It’s a different CID definitely.
Thanks!
1 Like
This is not a satisfactory answer, CID production is not a deterministic process this is expected in some capacity.
I’m curious how did you created your broken CID in the first place, sounds like some piece of code can generate invalid unixfs objects fed what are reasonable parameters ?
Sure. As achingbrain pointed out the problem was in accidentally adding block/chunk sizes slightly bigger than 1048576
.
So if you do something like following, with the chunk size bigger than 1048576` you will reproduce the situation I have described above.
...
const node = new UnixFS({ type: 'file' })
const link = createLink(path, size, cid)
node.addBlockSize(BigInt(size))
links.push(link)
const value = createNode(node.marshal(), links)
...
1 Like