Getting different file sizes when using ipfs.cat with and without "offset" and "length" parameters

Hi,
When reading the same CID using ipfs.cat with and without optional offset and length parameters I am getting different file sizes. Am I doing something wrong or this is a bug?

Sample code used is following.

const cid = CID.parse('bafybeifcd5ysgcmgsn3l3nebo3wu7jsuf7xumnsfup55dk2ikis3ujsnue')
let bufferBytes = 0

// Using ipfs.cat without offset and length parameters
for await (const buf of ipfs.cat(cid, {})) {
    bufferBytes += buf.byteLength
}
console.log(`Bytes received when using ipfs.cat WITHOUT offset and length parameters: ${bufferBytes}`)

// Using ipfs.cat with offset and length parameters
bufferBytes = 0
let offset = 0
const length = 1024*1024
let sliceBytes = 1
while(sliceBytes > 0) {
    sliceBytes = await readSlice(cid, offset, length)
    bufferBytes += sliceBytes
    offset = bufferBytes
}
console.log(`Bytes received when using ipfs.cat WITH offset and length parameters: ${bufferBytes}`)

async function readSlice(cid, offset, length) {
    let sliceLen = 0
    for await (const buf of ipfs.cat(cid, {offset: offset, length: length})) {
        const bufLen = buf.byteLength
        sliceLen += bufLen
    }
    console.log(`Offset: ${offset}, Slice: ${sliceLen}`)
    return sliceLen
}

//const stat = await ipfs.object.stat(cid)
//console.dir(stat, { depth: null })

The output of above code running is following

Bytes received when using ipfs.cat WITHOUT offset and length parameters: 7568164
Offset: 0, Slice: 1048576
Offset: 1048576, Slice: 1048576
Offset: 2097152, Slice: 1048576
Offset: 3145728, Slice: 1048576
Offset: 4194304, Slice: 1048576
Offset: 5242880, Slice: 1048576
Offset: 6291456, Slice: 1048576
Offset: 7340032, Slice: 229380
Offset: 7569412, Slice: 208
Offset: 7569620, Slice: 0
Bytes received when using ipfs.cat WITH offset and length parameters: 7569620

CID bafybeifcd5ysgcmgsn3l3nebo3wu7jsuf7xumnsfup55dk2ikis3ujsnue is an ZIP archive. When it is retrieved using offset and length parameters it is corrupted.

It must be something obvious but I can’t find what is the issue here.

Thanks for help!

1 Like

I’m trying to replicate this with Helia and having problems, though I can’t re-import the file behind bafybeifcd5ysgcmgsn3l3nebo3wu7jsuf7xumnsfup55dk2ikis3ujsnue and get the same CID - what import settings did you use?

can you use ipfs.io/ipfs/bafybeifcd5ysgcmgsn3l3nebo3wu7jsuf7xumnsfup55dk2ikis3ujsnue?format=car to get the blocks?

Hey achingbrain thank you for your answer!

The bafybeifcd5ysgcmgsn3l3nebo3wu7jsuf7xumnsfup55dk2ikis3ujsnue is an experiment made linking intentionally larger chunks (1m instead of 256k). It would be very beneficial for me to understand better how is ipfs.cat / exporter without provided offset and length able to read it correctly whilst if those parameters were provided it is not.

Thanks for your help!

If you add the file to Kubo with 1M chunks do you get the same CID?

E.g.

$ ipfs add --chunker=size-1048576 --cid-version=1 --raw-leaves ./foo.zip

I ask because looking at the DAG in IPLD Explorer shows that the last node linked to from the root is a raw node. It should be a dag-pb node with a raw node child like the others.

It also seems to have chunk sizes of 1048784 which when trying to replicate as an import setting causes Kubo to error out with Error: chunker parameters may not exceed the maximum chunk size of 1048576

If doing a depth-first traversal this datastructure should reasonably be handled, but I notice that downloading bafybeifcd5ysgcmgsn3l3nebo3wu7jsuf7xumnsfup55dk2ikis3ujsnue from the gateway gives me a corrupted zip file so I’m not sure Kubo handles this properly either.

1 Like

Indeed chunk sizes of 1048784 are the problem here. Amazingly ipfs.cat without provided optional parameters can read it well still?!

It’s a different CID definitely.

Thanks!

1 Like

This is not a satisfactory answer, CID production is not a deterministic process this is expected in some capacity.

I’m curious how did you created your broken CID in the first place, sounds like some piece of code can generate invalid unixfs objects fed what are reasonable parameters ?

Sure. As achingbrain pointed out the problem was in accidentally adding block/chunk sizes slightly bigger than 1048576.
So if you do something like following, with the chunk size bigger than 1048576` you will reproduce the situation I have described above.

...
const node = new UnixFS({ type: 'file' })
const link = createLink(path, size, cid)
node.addBlockSize(BigInt(size))
links.push(link)
const value = createNode(node.marshal(), links)
...
1 Like