Supporting Large IPLD Blocks

parkan · October 28, 2025, 7:33pm

(trying to answer my own question, I would assume various buffers are sized for small chunks, reads are sized to blocks instead of available memory, etc – but that doesn’t actually answer it)

adin · October 29, 2025, 6:12am

Are you looking for more than the “why block limits” section of my original post Supporting Large IPLD Blocks ?

For example, if you’re wondering why despite me writing this up 3 years ago (wow) it still hasn’t happened the TLDR is funding / prioritization. It’s not a small lift to move this from demoable to having specs and proper integration into most existing tooling (are you looking for more detail here?) and thus far the people who make the funding calls haven’t found this high enough ROI.

I think this is still quite important for IPFS ecosystem growth, and planning to fight for it to make it onto the priority list for 2026. If you do too I can ping you on the docs related to 2026 public roadmapping as they start existing.

By the way, in case the 3 year duration of this thread gives you despair and you’re looking for some optimism note that the integration of HTTP Trustless Gateway retrieval into the IPFS mainnet p2p layer (vs just as a way of verifying responses from gateways that fetch data from the network) earlier this year is IMO useful help here in decreasing the lift. Since the HTTP Trustless Gateway API is already setup to handle block and graph (CAR) based retrievals it gives us a surface to handle the in-between area of verification of large blocks (vs say reworking Bitswap or creating a new protocol). Still a bunch of work required between where we are and supporting large blocks, but at least there’s some progress.

parkan · October 29, 2025, 8:24am

Well, specifically I am reacting to how much debate there has been itt about picking a larger-but-still-conservative value – despiar is indeed high and solutions in that vein would do relatively little for my pain. looks sideways at IPNI

And yes, trustless gateways are indeed something that gives me some hope.

beenotung · November 27, 2025, 12:04am

Let me share some experiment result, if you want to increase block size for better deduplication.

It seems the block size doesn’t affect the deduplication ratio much.

Text-based Data

data: GitHub - beenotung/tslib: utils library in Typescript (including both source, built js file, and node_modules)

total_size: 624,806,286 bytes

Results

block size*	storage size*	# block reuse	saved (bytes)	saved %
1,024	155,365,444	483,353	469,440,842	75.13%
2,048	155,835,717	256,208	468,970,569	75.06%
4,096	156,015,582	146,666	468,790,704	75.03%
8,192	156,167,190	93,234	468,639,096	75.01%
16,384	156,281,878	67,585	468,524,408	74.99%
32,768	156,331,030	55,339	468,475,256	74.98%
65,536	156,429,334	49,467	468,376,952	74.96%
131,072	156,757,014	46,727	468,049,272	74.91%
262,144	157,019,158	45,461	467,787,128	74.87%
1,048,576	157,805,590	44,584	467,000,696	74.74%
4,194,304	160,951,318	44,420	463,854,968	74.24%
10,485,760	158,854,166	44,386	465,952,120	74.58%

Binary Data

data: backup app images of various versions of cursor (IDE forked from VSCode)

total_size: 2,822,251,957 bytes

block size*	storage size*	# block reuse	saved (bytes)	saved %
1,024	2,069,388,021	735,226	752,863,936	26.68%
2,048	2,069,908,597	367,358	752,343,360	26.66%
4,096	2,070,078,581	183,641	752,173,376	26.65%
8,192	2,070,413,557	91,778	751,838,400	26.64%
16,384	2,070,569,205	45,880	751,682,752	26.63%
32,768	2,070,847,733	22,932	751,404,224	26.62%
65,536	2,071,602,613	11,454	750,649,344	26.60%
131,072	2,072,782,261	5,718	749,469,696	26.56%
262,144	2,074,617,269	2,852	747,634,688	26.49%
1,048,576	2,087,200,181	701	735,051,776	26.04%
4,194,304	2,125,997,493	166	696,254,464	24.67%
10,485,760	2,182,620,597	61	639,631,360	22.66%

Remark: * the unit for storage size and saved size is bytes.

I know that deduplication ratio is not the only factor to consider, e.g. compat with bittorrent. Just sharing the result for your reference.

Caian · December 5, 2025, 11:36am

Hi all, I’m a little late for this discussion, but may I suggest a (maybe temporary) solution:

IPFS daemon administrator may set the bitswap block size limit. This definition may even be (as proposed in this thread) per IPLD type;
IPFS daemon administrator may also allow this to be overridden by users of the API;
IPLD/IPFS/IPNS URL may accept a maximum block size: `ipfs://max5G@SomeSHA256OfAnUbuntuISOFromTheWebsite`;

This model allows each node to define what’s their “DDoSable” threshold. Nodes may also make exceptions for blocks / IPNS dirs / entire DAGs that are important to their users. Having a system like this creates the grounds for a consensus based trust-list of nodes. Once the concept of multiple block sizes is normalized and some problems are identified, more efficient management methods can be developed.

Possible shortcomings:

Segregation of nodes based on maximum block limit;
Nodes should hint the block size when taking to other nodes, so transactions can be rejected right away;
The nodes may have to keep track of the limits on each node it’s connected to and blocks to avoid wasteful connections;
Extra node overhead managing an “optimal” network and making large data accessible.

danieln · December 6, 2025, 7:26pm

Thanks for sharing.

Regarding small blocks as a means for deduplication, I recently looked into this and shared why I think this is not the right trade-off if you consider the cost of announcements, and CID determinism and traversing DAGs over the network. Here’s the post where I elaborate on this:

As for deduplication more broadly, I’d check out content defined chunking (CDC) and the following paper: Analysis and Comparison of Deduplication Strategies in IPFS

Topic		Replies	Views
I Like Big Blocks And I Cannot Slice Protocol	5	249	June 5, 2025
Disk space consumption in IPFS Kubo	7	1599	June 24, 2022
Any plans to increas maximum block size? Help	4	509	February 16, 2024
(draft) Common Bytes - standard for data deduplication Ecosystem and Usage	14	2879	December 27, 2022
How to tune a private IPFS swarm for large files? Help kubo	16	729	November 14, 2025

Supporting Large IPLD Blocks

Text-based Data

Results

Binary Data

Related topics