Getting different file sizes when using ipfs.cat with and without "offset" and "length" parameters

momcilo.dzunic · July 3, 2023, 9:09am

Hi,
When reading the same CID using ipfs.cat with and without optional offset and length parameters I am getting different file sizes. Am I doing something wrong or this is a bug?

Sample code used is following.

const cid = CID.parse('bafybeifcd5ysgcmgsn3l3nebo3wu7jsuf7xumnsfup55dk2ikis3ujsnue')
let bufferBytes = 0

// Using ipfs.cat without offset and length parameters
for await (const buf of ipfs.cat(cid, {})) {
    bufferBytes += buf.byteLength
}
console.log(`Bytes received when using ipfs.cat WITHOUT offset and length parameters: ${bufferBytes}`)

// Using ipfs.cat with offset and length parameters
bufferBytes = 0
let offset = 0
const length = 1024*1024
let sliceBytes = 1
while(sliceBytes > 0) {
    sliceBytes = await readSlice(cid, offset, length)
    bufferBytes += sliceBytes
    offset = bufferBytes
}
console.log(`Bytes received when using ipfs.cat WITH offset and length parameters: ${bufferBytes}`)

async function readSlice(cid, offset, length) {
    let sliceLen = 0
    for await (const buf of ipfs.cat(cid, {offset: offset, length: length})) {
        const bufLen = buf.byteLength
        sliceLen += bufLen
    }
    console.log(`Offset: ${offset}, Slice: ${sliceLen}`)
    return sliceLen
}

//const stat = await ipfs.object.stat(cid)
//console.dir(stat, { depth: null })

The output of above code running is following

Bytes received when using ipfs.cat WITHOUT offset and length parameters: 7568164
Offset: 0, Slice: 1048576
Offset: 1048576, Slice: 1048576
Offset: 2097152, Slice: 1048576
Offset: 3145728, Slice: 1048576
Offset: 4194304, Slice: 1048576
Offset: 5242880, Slice: 1048576
Offset: 6291456, Slice: 1048576
Offset: 7340032, Slice: 229380
Offset: 7569412, Slice: 208
Offset: 7569620, Slice: 0
Bytes received when using ipfs.cat WITH offset and length parameters: 7569620

CID bafybeifcd5ysgcmgsn3l3nebo3wu7jsuf7xumnsfup55dk2ikis3ujsnue is an ZIP archive. When it is retrieved using offset and length parameters it is corrupted.

It must be something obvious but I can’t find what is the issue here.

Thanks for help!

achingbrain · July 3, 2023, 4:37pm

I’m trying to replicate this with Helia and having problems, though I can’t re-import the file behind bafybeifcd5ysgcmgsn3l3nebo3wu7jsuf7xumnsfup55dk2ikis3ujsnue and get the same CID - what import settings did you use?

willscott · July 3, 2023, 4:42pm

can you use ipfs.io/ipfs/bafybeifcd5ysgcmgsn3l3nebo3wu7jsuf7xumnsfup55dk2ikis3ujsnue?format=car to get the blocks?

momcilo.dzunic · July 4, 2023, 9:37am

Hey achingbrain thank you for your answer!

The bafybeifcd5ysgcmgsn3l3nebo3wu7jsuf7xumnsfup55dk2ikis3ujsnue is an experiment made linking intentionally larger chunks (1m instead of 256k). It would be very beneficial for me to understand better how is ipfs.cat / exporter without provided offset and length able to read it correctly whilst if those parameters were provided it is not.

Thanks for your help!

achingbrain · July 4, 2023, 10:01am

If you add the file to Kubo with 1M chunks do you get the same CID?

E.g.

$ ipfs add --chunker=size-1048576 --cid-version=1 --raw-leaves ./foo.zip

I ask because looking at the DAG in IPLD Explorer shows that the last node linked to from the root is a raw node. It should be a dag-pb node with a raw node child like the others.

It also seems to have chunk sizes of 1048784 which when trying to replicate as an import setting causes Kubo to error out with Error: chunker parameters may not exceed the maximum chunk size of 1048576

If doing a depth-first traversal this datastructure should reasonably be handled, but I notice that downloading bafybeifcd5ysgcmgsn3l3nebo3wu7jsuf7xumnsfup55dk2ikis3ujsnue from the gateway gives me a corrupted zip file so I’m not sure Kubo handles this properly either.

momcilo.dzunic · July 4, 2023, 10:37am

Indeed chunk sizes of 1048784 are the problem here. Amazingly ipfs.cat without provided optional parameters can read it well still?!

It’s a different CID definitely.

Thanks!

Jorropo · July 4, 2023, 4:46pm

This is not a satisfactory answer, CID production is not a deterministic process this is expected in some capacity.

I’m curious how did you created your broken CID in the first place, sounds like some piece of code can generate invalid unixfs objects fed what are reasonable parameters ?

momcilo.dzunic · July 4, 2023, 8:18pm

Sure. As achingbrain pointed out the problem was in accidentally adding block/chunk sizes slightly bigger than 1048576.
So if you do something like following, with the chunk size bigger than 1048576` you will reproduce the situation I have described above.

...
const node = new UnixFS({ type: 'file' })
const link = createLink(path, size, cid)
node.addBlockSize(BigInt(size))
links.push(link)
const value = createNode(node.marshal(), links)
...

Topic		Replies	Views
ipfs-http-client cat sometimes return incomplete file Help	0	169	March 26, 2023
IPFS gives same CID for fileObjects js-ipfs	4	599	April 24, 2020
Reading streams in Javascript js-ipfs js-ipfs	6	2229	August 13, 2018
Js-ipfs: ipfs.files.cat() with /IPFS/ or /IPNS/ prefixed address js-ipfs , ipns	0	1034	November 17, 2017
Return CID of a file Help	2	565	February 22, 2022

Getting different file sizes when using ipfs.cat with and without "offset" and "length" parameters

Related topics