I have multiple hashes representing important content, and I would like to distill all of this information to a single hash, which I can then easily ask my friends to pin for redundancy. It is possible to pin multiple hashes using ipfs pin add hash1 hash2 ..., but this does not result in a single hash for all content. Right now, I can achieve what I want by the following:
$ mkdir tmp
$ cd tmp
$ ipfs get hash1
$ ipfs get hash2
[...]
$ cd ..
$ ipfs add -r tmp
$ rm -rf tmp
This results in a hash to a directory that contains all hashes of interest as subdirectories (the subdirectory names are exactly the hashes of interest).
However, this process seems wasteful and roundabout. Imagine that I have thousands of hashes representing gigabytes of content. Each time one of the hashes changes, there is a lot of data that needs to move around following the above procedure. Is there a more direct way to start with a list of hashes and obtain the hash for a directory which includes those hashes as its entries, without any data even needing to leave the datastore?
Thank you for the reply. While this is new information to me, it does not solve my problem, as it assumes I already have a top level folder hash. What I have asked for is a method of efficiently generating this hash, without needing to check all relevant content out of the datastore.
In addition to the assumptions I mentioned, letās assume that the multiple hashes I am pinning have a high degree of overlapping content among them (i.e., descendants with identical hashes), making the procedure I suggested above especially wasteful in terms of disk space and resources.
The last line contains the top level folder hash. Each of the files in the folder have the same hashā¦ because they are all zero length empty files.
So, to put the folder under a single easy location, you can use ipns. Publish the top level folder hash to its own keyā¦ and if later new files are addedā¦ simply re-publish the folderās new hash.
Orā¦ to make it even easier, you can use the āfilesā command and then simply put all new files into the āfilesā folderā¦
This is precisely the āsolutionā I already mentioned in the first post, but it does not address the shortcomings I outlined in that post and in posts #3 and #4 in this thread. In principle, the process of generating this hash should only involve the computational cost of calculating the hash of the new directory listing. However, the method given above requires disk space and computational time proportional to the entire de-duplicated content stored recursively, which is intractable for my problem.
Are you asking for a method to generate a CID without adding any data to the IPFS datastore?
Does your use of the word ādatastoreā refer to the files in your filesystem rather than the IPFS datastore?
It seems logical that the computational work to generate CIDs external to IPFS would be the same as simply adding the data to the IPFS datastore. So, is your problem that you donāt have the drive space for external copies of the added data? Or do you want to pre-generate a list of CIDs before adding the files?
Itās unclear exactly what you are attempting and why?
I have a collection of hashes of content in the ipfs datastore, representing gigabytes of content, rivaling the disk space I have available to me. I would like to create a single ipfs directory containing all of this content as subdirectories. If I understand the design of ipfs (and if it is anything like that way git stores directories), generating this single hash and pinning it should not require making an additional copy of the data outside the ipfs datastore and then hashing it again in its entirety. It should be a very cheap operation, computationally.
-f, --flush bool - Flush target and ancestors after write. Default: true.
DESCRIPTION
Files is an API for manipulating IPFS objects as if they were a Unix
filesystem.
The files facility interacts with MFS (Mutable File System). MFS acts as a
single, dynamic filesystem mount. MFS has a root CID that is transparently
updated when a change happens (and can be checked with āipfs files stat /ā).
All files and folders within MFS are respected and will not be cleaned up
during garbage collections. MFS is independent from the list of pinned items
(āipfs pin lsā). Calls to āipfs pin addā and āipfs pin rmā will add and remove
pins independently of MFS. If MFS content that was
additionally pinned is removed by calling āipfs files rmā, it will still
remain pinned.
Content added with āipfs addā (which by default also becomes pinned), is not
added to MFS. Any content can be put into MFS with the command āipfs files cp
/ipfs/ /some/path/ā.
NOTE:
Most of the subcommands of āipfs filesā accept the āāflushā flag. It defaults
to true. Use caution when setting this flag to false. It will improve
performance for large numbers of file operations, but it does so at the cost
of consistency guarantees. If the daemon is unexpectedly killed before running
āipfs files flushā on the files in question, then data may be lost. This also
applies to run āipfs repo gcā concurrently with āāflush=falseā
operations.
SUBCOMMANDS
ipfs files chcid [] - Change the cid version or hash function of the root node of a given path.
ipfs files cp
For more information about each command, use:
āipfs files --helpā
I would use IPNS for this ā¦ just like my first post.
Once you generate your key and publish the top level directory CID to it, then all you need to do is distribute the IPNS key hash. If you add more files to the directory, simply publish new CID to the same key. In this way, the newest version of your file set will always be available at that same IPNS location.