Is there an implementation of IPFS that includes erasure coding like Reed-Solomon right now? (I have scan the issues and forum, get the answer was NO in previous years)
BTW, I am a senior student and my supervisor hope me to merge Reed-Solomon into one implementation of IPFS(like Kubo) to solve the problem of single node cold data storage. Is this achievable?
This is not how we use IPFS in the wild today, storing files is voluntary, nodes don’t randomly start storing others peoples files.
If someone want to store the file they store the blocks themselves.
I’m not sure what Reed Solomon would handle verifying the hashes or why it would be use full here.
The reason why I want to use Reed-Solomon is that IPFS maybe not a suitable storage method in some corner case(The scenario that someone use IPFS to store one file for years, but the hard drive where the data stored breaks down, since no one had ever pulled it, there was no copy anywhere else, so data was lost forever).
You are correct that the best way to avoid such a scenario is to make a replica locally. However, if IPFS supports erasure coding, creating a replica can be easier.
With erasure coding, users can use less space for a large amount of data fault tolerance. If I specifying three nodes to store data, traditional replication requires 300% of hard disk space, but through Reed-Solomon, nearly half of the storage space can be reduced for a similar effect.
I don’t think this project makes a lot of sense directly in Kubo.
IPFS is not that great as an everything file storage solution.
The main point is content addressing, (using hashes as addresses), this allows us to decouple content from, data transfer and storage.
Then underneath this the clients don’t need to care about how their files are stored and transferred because they rely and verify the hashes, then where the bytes came from we can just not care.
This allows other layers to provide their own solutions which support content addressing without having to update the trust model.
TL;DR: if I were you I would work on a new daemon that provides data over IPIP402 trustless gateway and bitswap so Kubo and Helia can download data from you, and then you can build your erasure coding network between your nodes.
IPFS storage-unit is the block, which is stored in a key value store keyed by CID. IIRC, erasure coding in the context of IPFS could split the block into multiple parts to obtain redundancy. i.e. we would have the equivalent of storing the block replicated 5x times, but only need 4x amounts of space (or similar, correct me if wrong).
In a machine, normally replication would be done at the filesystem level layer (i.e. raid)… you could think of introducing replication at the ipfs-datastore layer but you would need to make that datastore aware of the different disks or block storage available to the machine, as otherwise it makes little sense. I suppose software Reed-Solomon implementations for local storage already exist, but I don’t know how practical they are outside cold storage (raid 6 uses it for parity blocks I think). But if such things already exist, there’s no point in making it for IPFS blocks and having the IPFS-layer worry about something that can happen better at the filesystem layer.
Another option is to do it at a layer above IPFS. i.e. what you linked. Having IPFS Cluster implement erasure coding is an idea that has been around for long. Apart from other issues, the practical problem is that if you split a block in multiple parts and put the parts in multiple machines, each machine will have to reconstruct it every time it needs to provide it, fetching from other machines. This is not very practical and requires accounting etc. in a distributed system (Cluster) which is much more painful that in a local machine. That makes Cluster unsuitable for erasure coding right now.
Nevertheless, I think it is interesting to explore how erasure coding and IPFS could relate to each other. I.e. given an IPLD DAG, what transformation is necessary to obtain a new DAG (or DAGs) which represents the same information about the original DAG but Reed-Solomon encoded, and what would be the process to extract a block present in the original DAG from the Reed-Solomon one(s), given its CID? How would we carry the index etc… does this make sense at all?
If these questions are answered, then building a system that integrates Reed-Solomon into IPFS in some way would be much easier as IPFS already knows how to deal with DAGs, traverse, move then around.
Thanks @hector ,
I’m really appreciate you response and apologize for the delayed reply ( I’ve spend time built a foundational understanding of IPFS and IPFS-Cluster.
Yeah, you’re right. We can split the file to several origin data shards then calculate parity shards, if some data shards damaged, we can use the parity shards to recover origin data shards.
Sure, but maybe I can build a plugin about Erasure Coding for IPFS Cluster.
Yes, but for private cluster, duplicate storage usually for fault tolerance. Some time we don’t want to store the whole data for every machine, sharding and storing data is useful for cold store scenario. And we can access file by specific machine, so others just for recovery. Anyway, I’d like to have a try!
I suppose the struct(metadata) of Reed-Solomon DAG maybe not much different from shard DAG in this issue, because we just need to calculate parity shards and store. Then use it to reconstruct data shard when it lose.
However there also some problem need to be solved.
IPFS Cluster does not support sharding right now.
IIRC, IPFS Cluster unsuport shard is because IPFS can’t access the DAG in specific depth, thus we can’t build the folder to a DAG link struct? Maybe we can compress the folder as a tarfile and decompress by IPFS Cluster?
IPFS Cluster send data to all peers one time. we should try to organize shards and send to different peers by set replication.factor to 0.
ipfs-cluster-ctl doesn’t have a command like get . If the whole file doesn’t have local storage, we need to get data shards from peers, combine them, and even recover from parity shards.
P.S, Why not IPFS Cluster implement ipfs-cluster-ctl get, because we can use ipfs get ?
Work Flow
add:
Use ipfs-cluster-ctl add file with [-ec] (erasure coding). Send api to IPFS Cluster daemon.
Cluster daemon receive post, use Reed-Solomon for splitting file to data shards and parity shards, recording metadata and make total data+parity+1 ipld.Nodes (data and parity is the number of data and parity shards, while 1 is the Merkel DAG of this file).
Set pinmode to Send different ipfs.Nodes to peers.
Send api to IPFS daemon.
get:
Use ipfs-cluster-ctl get file with cid. Send api to IPFS Cluster daemon.
Check the metadata, if use erasure coding, fetch IPFS daemon for the cid.
If IPFS doesn’t have the file, ask peers for data shards via metadata.
If data shard broken, ask peers for parity shards via metadata.
Reconstruct data shards and send recovered shards to peers.
Sharding is “unsupported” because Kubo does not have custom-depth-pinning. Since there are links between DAG-Nodes that are in different shards, all the shards would get recursively pinned everywhere. There is no way to tell Kubo to pin Nodes in a shard while keeping those nodes in their original form.
In fact, I should maybe say there WAS no way. As now the the CID of those blocks could be written using the Raw codec and thus prevent traversal. Since blocks are now only addressed as multihashes.
IPFS Cluster send data to all peers one time. we should try to organize shards and send to different peers by set replication.factor to 0.
You would need to find a way to do it that fits in. For example, copy the sharding code but store metadata on the root-node of the meta-dag how many of the linked shards correspond to data-shards. The rest would be checksum shards.
ipfs-cluster-ctl doesn’t have a command like get . If the whole file doesn’t have local storage, we need to get data shards from peers, combine them, and even recover from parity shards.P.S, Why not IPFS Cluster implement ipfs-cluster-ctl get, because we can use ipfs get ?
I misremembered and it seems you don’t need to re-chunk dag-nodes and can store them in the same way as now, but the additional checksums would provide the possibility of reconstructing the data.
Based on that, there main need would be for an ipfs-cluster-ctl reconstruct command which is like a get but specifically using checksums to recover any non-fetchable nodes. I think this is what you were thinking.
For implementation, I would start by adding a new adder module, similar to the sharding one but one that creates extra checksum shards. We can work from there to wire it in (step 2), and figure out api (step 3).
Your analysis it very clearly!
I’m work in progress, but meet some problem just close to what you said.
I found that, but there are several question.
when sharding dag service put the data to ipfs, it ingest block, record the size of rawData, but in the end, pin data is bigger than original data(red label of follow image)? But when use ipfs get to get the file, the file size is equal to the origin file.
why only wrapping data in directory can make cid = lastCid when sharding(blue label of follow image)?
The file is read by multipart.Reader, maybe we only can read data once then use reedsolomon.Encode to generate parity shards (the checksum you mentioned), and get data blocks by DAGService.Add(). So I use that method and generate parity shards during traversal dag. But here comes trouble. After I generate parity shards, I can’t append the parity shards to multipart.reader, Where can I store the parity shards?
My implementation right now is directly pin the parity shard as a new file(it’s really simple but stupid, may be reuse sharding is better), and record the metadata to new a ClusterDAGPin parity-ClusterDAG links by ClusterDAG.
It sounds great, I will follow your advice! btw, I already implement a new get method in ipfs-cluster, never mind. I just want to ask a question. I simple use BlockGet to implement DAGService.Get, and I can get whole file by root cid, but cannot get shard rawdata by shard cid. it show the error below. Maybe I use wrong codec decoder?
cannot parse 1st data shard(bafyreigia5dxdgizm2uhl577pkpu5afszghtqhsxacq34zhf3oirs7ixti) to rawdata: protobuf: (PBNode) invalid wireType, expected 2, got 3
content is usually wrapped in unixfs blocks so there is some metadata that makes the sum of the size of the DAG blocks be larger than the original content.
I think this is an internal quirk. The way ipfs-adding works means that an IPLD-node representing a folder wrapping the added files is always created and written to the DAG-service.
However, if things are not meant to be wrapped in a directory, we return the first (and only) child of the wrapping folder as root. Still, the wrap-directory-cid was sent to the DAG-service as the last node and does not correspond to the desired root. I think the warning can be ignored.
Maybe it’s not a good choice, as if the size of the parity shards is the same as the block size, it would store more metadata, making the reconstruction process slower and more complex.
I’m not sure how to “add those to the meta-dag”. Are you suggesting to add the CID of the parity blocks as the metadata of the meta-dag? If so, how can we ensure that the parity blocks not be GC?
Therefore, I’m currently sending blocks to the ReedSolomon module and, after accumulating a sufficient number of data shards, start to encode it and send the parity shards back to the adder . (The size of the parity shards is the same as the largest data shard)
You suggested that make a new adder module to handle parity shards, but I don’t know how to handle data and parity shards by one DAGService(I can’t easily append parity shard’s data to MultiReader and change the layout, and need to ensure the size of parity shards is alway the same), so I add a dagService2(single/dag_service) to adder in order to pin parity shards.
Yeah, I notice that, and use multicodec to decode it, but still error. Maybe it’s because there are different types of ipld.Node in the nd.Links()? But it’s okay, I avoided this problem by skipping the cid starting with “Qm”.
Actually, I have implement a simple version.
It uses ReedSolomon to encode files and recover them even if some shards are lost.
When adding a file using erasure, most of the sharding logic remains unchanged. However, ReedSolomon generates some redundant parity shards and pins each parity shard as a single file. The allocation of data shards and parity shards is set to only one peer. (Allocation may need to be optimized based on user needs)
Thanks to the bitswap mechanism of IPFS, we can easily use ipfs block get to retrieve peers’ blocks.
For recovery, we simply scan the pinset, identify erasure-coded files, and retrieve the shards’ blocks from IPFSConnector.BlockGet. Then use ReedSolomon to reconstruct and re-Pin the data using the previous metadata.
P.S. It’s possible that this feature be added to IPFS Cluster in the future? If so, I am very willing to continue improving it!