Request for Comments – Data Compression with IPFS Support

I’m considering developing an application for IPFS, mainly for personal use, and I’d like to gather feedback to see if it could be useful to others as well. That’s the reason for this post and any honest opinion is welcome! I’d also appreciate knowing whether this idea has been explored before.

Introduction

Dictionary-based compression methods work by generating a dictionary that maps short codes to the most frequent terms or patterns in the input data. The compressed file is then written using these short codes, referencing the patterns stored just once in the dictionary.

Example

original:
The quick brown fox jumps over the lazy dog. The quick brown fox is fast.

dictionary:

ID Dictionary Entry
0 The quick brown fox
10 jumps over
110 the lazy dog
1110 is fast

compressed:
0 10 110. 0 1110.

Proposal

A system that compresses files using dictionary methods and stores the dictionary in IPFS (qualifying as a static dictionary). The compressed file is stored locally. This approach has the following advantages:

  1. Dictionaries should not be trained using private data, they should be built from public datasets. It’s possible to store large, comprehensive shared dictionaries tailored to diferent data types. I believe that compression ratios can be significantly improved compared to local dictionary methods.
  2. Dictionary downloads tend to be fast and parallelizable. Large dictionaries can be split into smaller chunks, allowing for faster, efficient and more scalable downloads of large compressed files, especially compared to centralized sources.
  3. Dictionaries contain only meaningless data. The actual information remains stored locally, making it completely secure. This is particularly useful in scenarios involving private data, regulatory compliance, or even as an alternative to encryption in certain use cases.