What is better? Large containers or large sets of files?

From @githubber314159 on Sun Apr 16 2017 18:43:18 GMT+0000 (UTC)

Hello everyone! I am wondering whether it’s better to share one big dump (e.g. latest english Wikipedia dump as one file) or may small files which constitute the dump file.

Arguments for big dumps: performance: one hash in the network.

Arguments for small files: gnome-4.5.tar.gz is already in the network, wouldn’t be deduplicated if a Linux-ISO containing the file would be introduced, too. Only new additions need new storage space.

This is similar to the dichotomy of Debian’s way to link packages if possible (both reducing disk space/RAM requirements) opposed to Docker images.

What do you think?


Copied from original issue: https://github.com/ipfs/faq/issues/250

From @hsanjuan on Sun Apr 23 2017 14:28:47 GMT+0000 (UTC)

> one hash in the network.

ipfs will chunk big files so you will have many hashes anyway.

I think this was never fully answered.

Let’s say we have Ubuntu, Xubuntu, Edubunu, Kubuntu,… Live ISOs. Files are stored inside the ISO in a compressed squashfs filesystem. It is safe to assume that a large portion, if not the majority of the infividual files inside the squashfs filesystem is identical, but since the files may be differently arranged in the squashfs filesystems, the fixed block size chunks may all be different between the different Live ISOs.

Wouldn’t it be most efficient for deduplication and re-use of already downloaded parts if chunks were not made by using predefined block sizes, but with some knowledge of the squashfs filesystem?

Or, could we have IPFS work on chunk the squashfs based on the individual files that make up a Linux Live ISO? In this case, the file libc.so.6 from Ubuntu, Xubuntu, Edubunu, Kubuntu,… would get the same hash (because it is always the same file) and could be shared across all of the mentioned Live ISOs.

In other words, wouldn’t we need content-aware chunking mechanisms rather than fixed block sizes?

The answer to this is also relevant to IPFS for AppImage: Distribution of Linux applications