Alternative generic chunking algorithm

sergey-shandar · November 26, 2024, 12:21pm

Hello, I have an idea of the chunking algorithm. It should work better than Rabin fingerprint and is highly configurable. The algorithm builds a content-dependent Merkle Tree and can be used for deduplication and compression. I’ve implemented it in my content-addressable storage (CAS), and it works well and efficiently. Is there any interest in implementing it for IPFS? Where should I start the discussion? Or should I?

Akita · December 1, 2024, 8:02pm

Is there any place where we can learn how it works?

sergey-shandar · December 1, 2024, 11:14pm

I have this article about CDT: Content-Dependent Hash Tree. In our early discussion, we highlighted… | by Sergey Shandar | Medium Also, I’ve implemented my own open-source CAS using one of the implementation of the algorithm: GitHub - datablockset/blockset: The BLOCKSET application is a command line program that can store and retrieve data blocks using a content-dependent tree (CDT) hash function as a universal address of the blocks.

Topic		Replies	Views
(draft) Common Bytes - standard for data deduplication Ecosystem and Usage	14	2798	December 27, 2022
CID concept is broken Ecosystem and Usage	68	4308	February 2, 2022
Does the IPFS chunking change the CID for the same file chunked differently? Docs & Tutorials	2	915	June 26, 2021
Which chunking algorithms are available? Help	2	864	May 23, 2017
Data Structure Optimization Research For Optimal Deduplication Help	13	948	March 16, 2021

Alternative generic chunking algorithm

Related topics