What is better? Large containers or large sets of files?

probonopd · December 4, 2017, 12:56am

I think this was never fully answered.

Let’s say we have Ubuntu, Xubuntu, Edubunu, Kubuntu,… Live ISOs. Files are stored inside the ISO in a compressed squashfs filesystem. It is safe to assume that a large portion, if not the majority of the infividual files inside the squashfs filesystem is identical, but since the files may be differently arranged in the squashfs filesystems, the fixed block size chunks may all be different between the different Live ISOs.

Wouldn’t it be most efficient for deduplication and re-use of already downloaded parts if chunks were not made by using predefined block sizes, but with some knowledge of the squashfs filesystem?

Or, could we have IPFS work on chunk the squashfs based on the individual files that make up a Linux Live ISO? In this case, the file libc.so.6 from Ubuntu, Xubuntu, Edubunu, Kubuntu,… would get the same hash (because it is always the same file) and could be shared across all of the mentioned Live ISOs.

In other words, wouldn’t we need content-aware chunking mechanisms rather than fixed block sizes?

The answer to this is also relevant to IPFS for AppImage: Distribution of Linux applications

Topic		Replies	Views
Why not just a DHT of who has which file? Help	10	395	April 15, 2021
Large files question about duplication Help	1	692	May 23, 2017
File systems chunk small files - big files Help	5	2773	April 17, 2019
IPFS and deduplication Ecosystem and Usage use-cases-and-apps	7	903	June 4, 2022
I Like Big Blocks And I Cannot Slice Protocol	4	35	May 9, 2025

What is better? Large containers or large sets of files?

Related topics