Incomprehension of the difference between "chunk" and "block"

Zesor1 · July 20, 2022, 10:09am

Hello,

I am actually reading a lot of documentation of IPFS ecosystem (IPNS, MFS, UNIXFS, IPLD etc…).
I actually understand the concept of “chunk”, this is a piece of data that corresponds to the leaf of the Merkle DAGs created when you add a file on IPFS.
I read the doc of IPLD on the git repository : GitHub - ipld/docs: All you need to know about IPLD
The notion that i don’t understand is a “block”, a “block” is defined first here “So now, with multihash, a single identifier can get us any set of binary data (what we call a “Block”) from anywhere in the world.”
in “From data to Data Structure” section, below i see that in the section “Block” a block is defined “The term “block” is used to refer to the pairing of a raw binary and a CID.” Below again the doc says in “Node” section that “a block can contain many nodes”.

Can you explain me the truth about about a “block” and the difference between a chunk and a block ?

SionoiS · July 20, 2022, 12:23pm

Let me try!

Chunks are what is produced by the chunking algorithm.
Those chunks are then hashed and blocks are hash + chunk OR hash + links to other blocks.

Chunks are not super relevant IMO.

Zesor1 · July 20, 2022, 12:31pm

Thanks, that’s a little clearer. So a block is a hash + chunk.

hector · July 20, 2022, 12:41pm

A block is the strict unit of data referenced by a CID. A block is the binary payload that when hashed produces the same CID. A block is always content-addressed.

A block is not the full Merkle-DAG referenced by a root CID, nor anything that has been interpreted/parsed (i.e. a unixfs node).

A chunk is just a piece of binary data, usually becomes a block when it is written to the blockstore, obtaining a CID that can be used to read it later (as a block).

Zesor1 · July 20, 2022, 12:55pm

Ok i understand, but can you talk about a UnixFS block ? or a block doesn’t store data of UnixFS like if this is a directory and this a file etc…
IPFS intervenes above the block or in the block ?

hector · July 20, 2022, 1:09pm

Usually,

a CID references a block. The block can be parsed as a go-merkledag protobuf.

The go-merkledag protobuf has a binary payload which can be parsed as unixfs protobuf.

The unixfs protobuf, when of File type, would have a payload which is the chunk bytes.

Unless you are using “raw leaves”, in which case the CID references the chunk bytes directly.

Spelunking here: Disk space consumption in IPFS - #4 by hector

A block can store anything. The CID gives hints about how to parse it though since it has multicodec in it (that usually is dag-pb which indicates the block can be interpreted as a go-merkledag protobuf).

zacharywhitley · July 20, 2022, 1:22pm

Chunks are chopped up pieces of what you’re adding to IPFS. Blocks are the pieces of IPFS. It’s sort of like ordering from Amazon. The stuff you order is like the chunks and you can split up your order into a number of different chunks, the packages you put your chunks into are the IPFS blocks. If they’re raw blocks it’s just a single item and nothing else but Amazon can also throw some extra stuff in there like packing material and advertisements, an invoice, etc. That’s the merkledag. Under this analogy sometimes you’'ll get boxes with nothing but a note saying, “here are the tracking numbers of two other boxes that will either have your stuff or another empty box with more tracking numbers”.

Zesor1 · July 20, 2022, 1:25pm

Thank for your response @hector that is quite helpful.

Zesor1 · July 20, 2022, 1:27pm

thank you @zacharywhitley for your response !

Zesor1 · July 20, 2022, 1:54pm

One more thing @hector , can a block just contains information of this children ? (i.e a block of a folder that points in CID root of this files in the folder) if this is the case what’s the difference between a merkle dag node and a block.

hector · July 20, 2022, 2:23pm

There is no such thing as a “block of a folder”. A better term is “a unixfs node of type Folder”. Or a “dag-pb node”.

These are all things that come from parsing blocks. Blocks themselves are just a piece of content-addressed data. Interpreting that data and figuring out if it has links etc. is done at a higher layer where they are not called “blocks” anymore.

Also, unixfs is a payload that doesn’t have links. The dag-pb node in which it is contained does have links to other nodes. In the case of unixfs folders, those links may actually be the files in the folder. But also big folders use a HAMT, so those links are just pointing to other nodes in a data structure and not the actual files in the folders.

Zesor1 · July 20, 2022, 2:56pm

Thank you this answer helped me a lot

Topic		Replies	Views
Dag-pb and UnixFs confusion	1	290	August 30, 2022
Does the IPFS chunking change the CID for the same file chunked differently? Docs & Tutorials	2	914	June 26, 2021
What are some differences between IPFS implementations?	2	567	November 23, 2022
How to decipher root node content? ipld , go-ipfs , files	3	651	June 24, 2021
I Like Big Blocks And I Cannot Slice Protocol	5	149	June 5, 2025

Incomprehension of the difference between "chunk" and "block"

Related topics