Incomprehension of the difference between "chunk" and "block"

Hello,

I am actually reading a lot of documentation of IPFS ecosystem (IPNS, MFS, UNIXFS, IPLD etc…).
I actually understand the concept of “chunk”, this is a piece of data that corresponds to the leaf of the Merkle DAGs created when you add a file on IPFS.
I read the doc of IPLD on the git repository : GitHub - ipld/docs: All you need to know about IPLD
The notion that i don’t understand is a “block”, a “block” is defined first here “So now, with multihash, a single identifier can get us any set of binary data (what we call a “Block”) from anywhere in the world.”
in “From data to Data Structure” section, below i see that in the section “Block” a block is defined “The term “block” is used to refer to the pairing of a raw binary and a CID.” Below again the doc says in “Node” section that “a block can contain many nodes”.

Can you explain me the truth about about a “block” and the difference between a chunk and a block ?

1 Like

Let me try!

Chunks are what is produced by the chunking algorithm.
Those chunks are then hashed and blocks are hash + chunk OR hash + links to other blocks.

Chunks are not super relevant IMO.

Thanks, that’s a little clearer. So a block is a hash + chunk.

A block is the strict unit of data referenced by a CID. A block is the binary payload that when hashed produces the same CID. A block is always content-addressed.

A block is not the full Merkle-DAG referenced by a root CID, nor anything that has been interpreted/parsed (i.e. a unixfs node).

A chunk is just a piece of binary data, usually becomes a block when it is written to the blockstore, obtaining a CID that can be used to read it later (as a block).

1 Like

Ok i understand, but can you talk about a UnixFS block ? or a block doesn’t store data of UnixFS like if this is a directory and this a file etc…
IPFS intervenes above the block or in the block ?

Usually,

a CID references a block. The block can be parsed as a go-merkledag protobuf.

The go-merkledag protobuf has a binary payload which can be parsed as unixfs protobuf.

The unixfs protobuf, when of File type, would have a payload which is the chunk bytes.

Unless you are using “raw leaves”, in which case the CID references the chunk bytes directly.

Spelunking here: Disk space consumption in IPFS - #4 by hector

A block can store anything. The CID gives hints about how to parse it though since it has multicodec in it (that usually is dag-pb which indicates the block can be interpreted as a go-merkledag protobuf).

Chunks are chopped up pieces of what you’re adding to IPFS. Blocks are the pieces of IPFS. It’s sort of like ordering from Amazon. The stuff you order is like the chunks and you can split up your order into a number of different chunks, the packages you put your chunks into are the IPFS blocks. If they’re raw blocks it’s just a single item and nothing else but Amazon can also throw some extra stuff in there like packing material and advertisements, an invoice, etc. That’s the merkledag. Under this analogy sometimes you’'ll get boxes with nothing but a note saying, “here are the tracking numbers of two other boxes that will either have your stuff or another empty box with more tracking numbers”.

1 Like

Thank for your response @hector that is quite helpful.

thank you @zacharywhitley for your response !

One more thing @hector , can a block just contains information of this children ? (i.e a block of a folder that points in CID root of this files in the folder) if this is the case what’s the difference between a merkle dag node and a block.

There is no such thing as a “block of a folder”. A better term is “a unixfs node of type Folder”. Or a “dag-pb node”.

These are all things that come from parsing blocks. Blocks themselves are just a piece of content-addressed data. Interpreting that data and figuring out if it has links etc. is done at a higher layer where they are not called “blocks” anymore.

Also, unixfs is a payload that doesn’t have links. The dag-pb node in which it is contained does have links to other nodes. In the case of unixfs folders, those links may actually be the files in the folder. But also big folders use a HAMT, so those links are just pointing to other nodes in a data structure and not the actual files in the folders.

Thank you this answer helped me a lot