It seems js-ipld-git is now archived. What are the plans ?
Git uses delta-encoding internally. Does IPFS simply put .git
on IPLD like importing files, or use its own IPLD git format ?
Besides delta-encoding, a more general content/context aware encoding should be handy, for any blocks that involve revisions. Anyway applications should not interfere with storage. Say, for a block when it’s modified, the new block is stored and exchanged as an ipld-delta-encoded-block internally. What applications see doesn’t change. And at what layer will this be implemented. What about encryption then ?
I read ipfs/notes, and it seems to have been asked several times.
opened 11:14PM - 26 Feb 15 UTC
Consider the following design space: some nodes on the network are used for stor… age, they store and serve compressed data to optimize network throughput and storage efficiency; other nodes are used for computation with the data and they are space constrained (e.g. they must minimize SSD usage), so they store only uncompressed files, but as long as their data is unmodified, they can also seed it to nearby nodes.
To support both compressed and uncompressed seeders, or to present compressed data to applications as being uncompressed, or to store and serve already seeded uncompressed data in a compressed form, it seems worthwhile to support data compression below application level, in IPFS. This suggests support for storage and transmission of IPFSObjects with an additional field specifying the method of decompression, whose hash is computed as if the data was not compressed and the field was not present.
In a way, this resembles squashfs, which deflates data and then presents it as a normal read-only filesystem.
Another option is to introduce a separate IPFSCompressedObject, advertise it both under its own and uncompressed hashes, and when another node requests it under uncompressed hash, negotiate either to serve the compressed object, or to decompress it prior to transmission.
What do you think about this problem? Can the solution be simplified?
opened 09:40AM - 30 Oct 19 UTC
* Motivation:
When a file is updated and resync again, decrement its block d… uplication on nodes all over the world and decrease communication cost (only downloaded the updated blocks) and save storage (only updated section of the file as blocks will be stored).
-----------------
* Problem:
Example, there is a .tar.gz file, which contains a data.txt file, file.tar.gz (~100 GB) stored in my IPFS-repo, which is pulled from another Node-a.
I open the data.txt file and added a single character in a random locations in the file (beginning of the file, middle of the file, and end of the file), and compress it again as file.tar.gz and store it in my IPFS-repo. Here update is only few kilobytes.
[[*]](https://discuss.ipfs.tech/t/does-ipfs-provide-block-level-file-copying-feature/6388/4?u=avatar-lavventura) When I deleted a single character at the beginning of a file, since the hash of all 124kb-blocks will be altered, which will lead to download complete file to be downloaded.
As a result, when node-a wants to re-get the updated tar.gz file a re-sync will take place and whole file will be downloaded all over again. As a result there will be duplication of blocks (~100 GB in this example ) even the change is made only for few kilobytes. **And iteratively this duplication will be distributed to all over the peers, which is very inefficient and consumes high amount of storage and additional communication cost over time.**
-----------
* Solution:
Other clouds are try to solve this problem using [Block-Level File Copying](https://superuser.com/a/1368955). On their case like IPFS, since blocklist is considered for "Block-Level File Copying"; when a file is updated (a character is added at the beginning of the file), Dropbox, One-Drive will re-upload the whole file since the first block's hash will be change and it will also affect/change the hash of all the consequent blocks. This doesn't solve the problem.
**=>** I believe better soluiton is to consider between each commits of the files, approach that [git-diff](https://git-scm.com/docs/git-diff) uses could be considered. This will only uploads the changed (diff) parts of the file, that will be few kilobytes on the example I give, and its diffed blocks will be merged when other nodes pull that file. So as communication cost only few kilobytes of that will be transferes and that amount of data will be added to storage will be only few kilobytes as well.
I know that it will be difficult to re-design IPFS's design, but this could be done as a wrapper solution that combines `IPFS` and `git`, and users can use it for very large files based on their needs.
----------
This problem is not considered as priority by IPFS team but at least it should be on the priority.
> IPFS team is considering adding that eventually, but it’s not a priority.
------
Please see discussion I have already opened. Please feel free to add your ideas in to them.
=> [Does IPFS provide block-level file copying feature?](https://discuss.ipfs.tech/t/does-ipfs-provide-block-level-file-copying-feature/6388)
=> [Efficiency of IPFS for sharing updated file](https://stackoverflow.com/a/52246029/2402577)