I have multiple hashes representing important content, and I would like to distill all of this information to a single hash, which I can then easily ask my friends to pin for redundancy. It is possible to pin multiple hashes using ipfs pin add hash1 hash2 ..., but this does not result in a single hash for all content. Right now, I can achieve what I want by the following:
$ mkdir tmp
$ cd tmp
$ ipfs get hash1
$ ipfs get hash2
$ cd ..
$ ipfs add -r tmp
$ rm -rf tmp
This results in a hash to a directory that contains all hashes of interest as subdirectories (the subdirectory names are exactly the hashes of interest).
However, this process seems wasteful and roundabout. Imagine that I have thousands of hashes representing gigabytes of content. Each time one of the hashes changes, there is a lot of data that needs to move around following the above procedure. Is there a more direct way to start with a list of hashes and obtain the hash for a directory which includes those hashes as its entries, without any data even needing to leave the datastore?
Thank you for the reply. While this is new information to me, it does not solve my problem, as it assumes I already have a top level folder hash. What I have asked for is a method of efficiently generating this hash, without needing to check all relevant content out of the datastore.
In addition to the assumptions I mentioned, let’s assume that the multiple hashes I am pinning have a high degree of overlapping content among them (i.e., descendants with identical hashes), making the procedure I suggested above especially wasteful in terms of disk space and resources.
This is precisely the “solution” I already mentioned in the first post, but it does not address the shortcomings I outlined in that post and in posts #3 and #4 in this thread. In principle, the process of generating this hash should only involve the computational cost of calculating the hash of the new directory listing. However, the method given above requires disk space and computational time proportional to the entire de-duplicated content stored recursively, which is intractable for my problem.
Are you asking for a method to generate a CID without adding any data to the IPFS datastore?
Does your use of the word “datastore” refer to the files in your filesystem rather than the IPFS datastore?
It seems logical that the computational work to generate CIDs external to IPFS would be the same as simply adding the data to the IPFS datastore. So, is your problem that you don’t have the drive space for external copies of the added data? Or do you want to pre-generate a list of CIDs before adding the files?
It’s unclear exactly what you are attempting and why?
I have a collection of hashes of content in the ipfs datastore, representing gigabytes of content, rivaling the disk space I have available to me. I would like to create a single ipfs directory containing all of this content as subdirectories. If I understand the design of ipfs (and if it is anything like that way git stores directories), generating this single hash and pinning it should not require making an additional copy of the data outside the ipfs datastore and then hashing it again in its entirety. It should be a very cheap operation, computationally.
-f, --flush bool - Flush target and ancestors after write. Default: true.
Files is an API for manipulating IPFS objects as if they were a Unix
The files facility interacts with MFS (Mutable File System). MFS acts as a
single, dynamic filesystem mount. MFS has a root CID that is transparently
updated when a change happens (and can be checked with “ipfs files stat /”).
All files and folders within MFS are respected and will not be cleaned up
during garbage collections. MFS is independent from the list of pinned items
(“ipfs pin ls”). Calls to “ipfs pin add” and “ipfs pin rm” will add and remove
pins independently of MFS. If MFS content that was
additionally pinned is removed by calling “ipfs files rm”, it will still
Content added with “ipfs add” (which by default also becomes pinned), is not
added to MFS. Any content can be put into MFS with the command “ipfs files cp
Most of the subcommands of ‘ipfs files’ accept the ‘–flush’ flag. It defaults
to true. Use caution when setting this flag to false. It will improve
performance for large numbers of file operations, but it does so at the cost
of consistency guarantees. If the daemon is unexpectedly killed before running
‘ipfs files flush’ on the files in question, then data may be lost. This also
applies to run ‘ipfs repo gc’ concurrently with ‘–flush=false’
ipfs files chcid  - Change the cid version or hash function of the root node of a given path.
ipfs files cp
For more information about each command, use:
‘ipfs files --help’
I would use IPNS for this … just like my first post.
Once you generate your key and publish the top level directory CID to it, then all you need to do is distribute the IPNS key hash. If you add more files to the directory, simply publish new CID to the same key. In this way, the newest version of your file set will always be available at that same IPNS location.