AFAIK, it is impossible modify a file stored in IPFS.
Technically, we can re-upload the file to the IPFS to simulate the “modify” operation.
But the problem is whenever we want to modify a small section of a large file, say 10GB. We have to re-upload such a large stuffs and waste many computation resources on duplicate blocks for just few new added blocks. Although with the help of data deduplication can we save huge spaces from duplicate blocks, we still suffer from time wasting operations that calculates hash of duplicated blocks.
And there is an awesome application called peergos.
As peergos described on their official page (Features):
Peergos can handle arbitrarily large files efficiently. Our maximum file size is far bigger than any other storage provider we are aware of (assuming you have enough space on the server). We can stream large files like videos and start playing immediately, or quickly skip though to a later part. Despite being end-to-end encrypted, we can efficiently modify small sections of large files.
They claimed that peergos can modify small section of large files efficiently.
Does anyone gets any idea on how they implement it or any good idea to solve the problem described as above?
I don’t know what Peergos does, but “editing” a file efficiently would involve keeping track of which blocks you modified, and adjust all the DAG nodes/branches affected up to the root.
If I understand it correctly, It is not yet available on go-ipfs to do such a modification, right ?
In current stage we can only re-upload the file unless we implement it by provided low level API. Which means that we still have to walk through the whole file and calculates the final hash value of the file.
Well, you can mount IPFS-MFS as a filesystem and you can modify files there as you wish. Other than that, using the HTTP APIs, it would be moderately painful. Writing a program in Go to write some contents given a root hash and an offset might be the saner way of doing things.
Peergos says they’re chunking in 5MB blocks, and the challenge with any chunking algorithm is that if you insert (for example) one single byte at the very front of the file, then the hash of every chunk changes (all of them shifted by a byte), and destroys the ability to reuse chunks, and it’s the same effect as duplicating all the storage of an entire file (because you get all new chunks).
I think I read somewhere that RSYNC has some kind of intelligent way of choosing their chunks, that isn’t simply at the boundary line of every chunk block size (but uses the data itself to intelligently choose block boundaries), and is based on the known need to reuse blocks.
So the key question is can IPFS do this kind of intelligent chunk selection? …because if so then theoretically it could be efficient in modifying large files and accomplishing also the equivalent of RSYNC where only small data transfers are required to sync large files/folders.
Ipfs can use Rabin and buzhash chunkers (--chunker option in ipfs add). These are meant to do that: magically find the right block boundaries to increase deduplication if possible. Of course, depending on the input, they will work better or worse. When using them, blocks will not be of a fixed size as by default.
That’s awesome news to hear Hector! Thanks for clarifying. I was researching recently how MFS could be made to ‘simulate’ or ‘accomplish’ something approaching the efficiency of an rsync and so it’s good to know this might work well.