In my trip to make my ftp server a p2p ftp server, many questions arise.
My today question:
I use filestore experimental capability to avoid duplication of data. Just saying.
I have add --nocopy a HUGE directory hierarchy about 10TBs.
Now I want to add a 1KB file IN that hierarchy.
File hash must be generated and all of its ancestor directory entries must regenerate their hash.
I don’t want to rehash complete hierarchy of 10TBs.
Now, let’s say that this file added in this path:
What commands I must run in console?
My primary problem is how to generate/add a directory’s hash, without its children entries (files and directories)
You can add a hash as a sub-file in a directory like this with object patch.
If you’re adding “file_added.txt” to root_folder/folder1/, where $DIR_HASH is root_folder’s current hash, and $FILE_ADDED_HASH is the hash of “file_added.txt”…
$ ipfs object patch $DIR_HASH add-link folder1/file_added.txt $FILE_ADDED_HASH
It’ll give you the new root_folder hash
Assuming you’re correct @jaidedau I’m pretty confused by this. Are you saying this will give the same hash (root_folder hash) as if
file_added.txt had always been there from the beginning? Or does it just give some arbitrary different hash that happens to point to all the same files?
Also it seems really bizarre that “object patch” is the way you add a file to a folder. Seems like there should just be some option on the “add” operation itself where you’d specify you want to add to some existing folder. Pretty much every time I “learn more” about how IPFS handles folders I get even more confused than I was the day before. lol.
Briefly looking at the documentation of
ipfs object patch add-link … I am reminded of
ln -s …
symbolic link vs. copy of file in directory structure.
So, if I understand correctly, there is no need to update all of ancestor directories. Only the root directory.
Until now, I though that EVERY directory contains data (filenames, hashes for them, sizes etc) but if I understand correctly, only the root directory stores that information for the complete hierarchy.
Am I correct?
If not, why I must update only the root directory and not all ancestors?
Good question Nick, I would’ve assumed if there was some merkle tree at the core of all this, there would be a merkle node somewhere in that tree representing each folder in a hierarchy of folders.
I guess maybe when you add a folder to IPFS it’s just taking all the path names relative to that folder, and gathering them recursively deep, but then just taking the results of that recursion and treating that as a simple flat “list” of files. So maybe regardless of how deep the directory structure is, it’s just always treated as if it was a bucket of files without really any inherent ‘structure’ except for what you can glean from analyzing the slashes in the file names of the contained files.
The only ‘downside’ to this, if there is one, is that after adding a large directory structure, you don’t end up being able to reference any of the sub-folders by their own CID. I might be wrong.
EDIT: This post was proven to be slightly incorrect because each subfolder does have it’s own CID determined by what’s under it, and I clarified it better in my next post (below).
Folders in IPFS are just merkledag objects. The reason you only need to give the hash of the root directory is that
object patch automatically updates the necessary subdirectories. Each directory holds only the information on it’s direct children. In IPFS, a directory hierarchy is a tree.
For example, here is a directory I call “a.”
added QmPvg1Y3p22Z2DArajBdrEqU8TdpyFHADkdN6Yr5G2jCNx a/b/mit.txt
added QmR8Rnk5QdXgrXRqmgMLmG5PuHZEjujfa3rfVhPV99TLY7 a/gpl.txt
added QmeWH5Rkv69fUBuazFM23iNXvi2da1FnQCR6arxCg2n6QM a/b
added QmVLbmqbWmbfTP1ts2rX8GiXkUE71zSqcjKsRHPquZnApf a
Notice the subdirectory’s hash (QmeWH5Rkv69fUBuazFM23iNXvi2da1FnQCR6arxCg2n6QM).
If I add a new file to the merkledag of b/ with
object patch, the hash of both b/ and a/ will change.
added QmPpupqJGeDwMAPgAvZw2QUjZAWgNu58ALhRTF9j6er5qU cc-by-sa-3.0.txt
The new file will be put in a/b/ as “cc-by-sa.txt”, and has the hash QmPpupqJGeDwMAPgAvZw2QUjZAWgNu58ALhRTF9j6er5qU.
ipfs object patch QmVLbmqbWmbfTP1ts2rX8GiXkUE71zSqcjKsRHPquZnApf add-link b/cc-by-sa-3.0.txt QmPpupqJGeDwMAPgAvZw2QUjZAWgNu58ALhRTF9j6er5qU
This returns the new root hash, QmXzxgjr7RZrGX2MEVcPvTF1h7za7eJ9htBj5Kf1miECjf.
The contents of this new root are:
QmPpupqJGeDwMAPgAvZw2QUjZAWgNu58ALhRTF9j6er5qU 15607 b/cc-by-sa-3.0.txt
QmPvg1Y3p22Z2DArajBdrEqU8TdpyFHADkdN6Yr5G2jCNx 1070 b/mit.txt
Qmd6qmfeLAtc5ebemL8hxfLo84HqU1qtEytVi9i2JwNB4x - b/
QmR8Rnk5QdXgrXRqmgMLmG5PuHZEjujfa3rfVhPV99TLY7 7652 gpl.txt
The hash of b/ changed from QmeWH5Rkv69fUBuazFM23iNXvi2da1FnQCR6arxCg2n6QM to Qmd6qmfeLAtc5ebemL8hxfLo84HqU1qtEytVi9i2JwNB4x, and the hash of a/, the root, changed too.
So adding a file to the folder in the above example was a two step process:
- Add file itself, to get it’s own CID
- Use the CID to ‘patch’ the file into the ‘folder structure’ ROOT as a link, because directory structures are apparently stored as links (trees of links really)
In other words, adding the file to IPFS v.s. putting that file in some specific folder, are two completely separate independent operations?
One final obvious question arises:
If we did the patch directly onto ‘/b’ instead of ‘/a’ (in the example @jaidedau gave), would that also mean that ‘/a’ would AUTOMATICALLY get updated too (w/ a new hash) representing the folder that contains the new file? I’m guessing “no” is the answer.
Probably the patch only is able to update things recursively underneath that level of the folder right? So what I said earlier is “partially” true, meaning once you upload a folder strucutre, it’s kind of it’s own ‘bucket’ (at the root), and when you add files anywhere under a data structure you need to still be always patching the ROOT (bucket), because the patch will only affect things ‘underneath’ (recursively) itself rather than at higher levels up. I’m saying all that as a question. …to ask if I’ve got it right yet?
You got it completely, yea!
A patch can only affect those under it, and if you patch ‘b/’ instead of ‘/a’, ‘/a’ isn’t automatically updated-- a hash can only refer to one object, so the old hash of ‘/a’ can never have different subdirectories, or be updated.
object patch has gotta give you a new one.
You can use MFS if you need root-folder hash to auto-update after modifications. And it is easier to work with when adding, removing or moving files around.
ipfs files stat <path> will give the root hash of the path-dag as it is.
What means “root hash auto updated”?
I have a filestore directory tree added with “add --nocopy”.
Auto updated means that when I delete or add or modify a file or directory in filestore, all of these ipfs links will automatically updated? (Ok not all, only needed)
Documentation about file systems states that I must inform ipfs for that changes. So it is deduced in the above jaidedau’s solution.
I am thinking to write a Python script which found differences with previous scan and patch the differences.
How MFS will be better in that?
If you’re using the “object patch” way of updating a folder (non-MFS folder), you must always patch the top level root itself, and you’ll therefore get an entire new root CID as a result. You can’t just patch some sub-folder and expect any new root CID to automatically come into existence.
But MFS is different, and more like a file real system, so that parent folders ARE always automatically aware of things happening in child folders.