Listing all of the files in the local node (and a few other basic questions)

Hey, so I’ve got some pretty basic beginner question here; I understand the basic concepts of IPFS, it’s a content-addressed peer-to-peer network where all the data is cut into the smallest chunks and placed hierarchically in a giant merkle tree, and the rest but have some question concerning how to actually effectively use ipfs.

So, say I have a directory baz containing files foo and bar and I want to add all of the files in baz to my ipfs node, my first instinct would be to type ipfs add -r baz which would give me CIDs for both files and the directory (sort of) itself. I had expected to be able to access them by the webui’s ‘Files’ page but they weren’t there; obviously, I can access them through the CIDs directly but what happens if I lose the CIDs? Are the files just “lost” in the merkle tree? So I guess my first real question is: when I add files to my node using ‘ipfs add’ where do they go? Obviously, they go somewhere in the ~/.ipfs directory so what I’m really asking is: How do you list, browse, find files in your local node without their CID?

Next, I’ve learned that in the hypothetical ipfs add -r baz scenario, well this would create an item somewhere in my local merkle trees that links to both foo and bar but wouldn’t contain (or maybe just not show) any actual metadata on the directory itself, in effect, the “baz” name would be lost along with any other information with the directory itself. The same thing happens with files when not using the -w flag in the add command: the item it creates, and gives you a CID for, won’t have the filename or any other metadata: just the raw contents. So using the -w flag, ipfs add will actually give you a CID to an item representing the hierarchically data you’d expect with a “file” on a file system, so my second major question is when you forget to add the ‘-w’ flag, does it still create the metadata item and just not tell (if so, how do you find the metadata item after the fact) or does it just completely ignore any surrounding context and create a lone item somewhere in the abyss of the merkle tree not linked anything else? (I know that in a merkle tree every item is supposed to have a single parent; more on that in my third major question) and, in either case, WHY ON EARTH ISN’T ACCOUNTING FOR METADATA WITH THE FILE/DIRECTORY THE DEFAULT? I understand the raw file-contents are treated as a immutable buffer and since any file can have any name, a metadata record like the kind created -w could potentially refer to multiple different files with the same name but, still, in most cases having the metadata record is useful, practically essential, (this is the Inter-Planetary File System, after all) and in the few situations in which you don’t care about any of the actual file details, you can just use the raw-content item CID anyway.

Next, so every item in a merkle tree has a parent, right? So shouldn’t I, for a file added as part of a directory, be able to “ascend” to, or at least get the CID for, its parent node? Like if I have a raw-content CID that refers directly to an immutable data buffer (a “leaf” in the merkle tree), I should be able to get to its parent file-metadata, or directory, item if it has one. The way I’d think to do this would be by using some path like notation like ipfs <cat|ls|refs> <CID>/.. or ipfs <cat|ls|refs> <CID>/parent but neither of the those work. I’m also not able to get its parent CID from the ipfs links <CID> or ipfs refs <CID> commands; those commands just give child-item CIDs, not the parent CID. So, for a file item, how do I access its parent or any directory/file-info nodes which have it as a child item?

And one more question, I’ve since discovered that the ‘Files’ page of the web UI is specifically for files added with the ipfs files add command: what makes files added using ‘files add’ different from files added using ‘add -w’? and why is this distinction necessary or helpful? and, again, are files added with ipfs add without the -w flag just lost in the maze of directories in the .ipfs folder?

Woo, so that’s quite a bit; I’m sorry for any headaches my potential misunderstanding of IPFS may have caused but these are all question I have from experimenting with IPFS that I wasn’t able to find a clear answer for in the docs.

There’s no giant merkle tree. More like many small ones.

This confuses many people. The Files tab lists files in what we can the MFS (Mutable File System). This is a “potentially-gigantic Merkle Tree” where you can put things. If you use the webUI they will get put there directly, but if you use ipfs add, you need to additionally do ipfs files cp /ipfs/<cid> /whatever/mfs/location.

In order to list content that is not on MFS you have several options. If you pinned it (default for add, then ipfs pin ls --type=recursive). If you did not pin it (i.e. you were browsing something using your local gateway), you can do ipfs refs local to list all blocks in your datastore (though this will show also chunks from files etc). There is no other way to know what you have, unless you know what you are looking for. I.e. you could check that you have the ipfs website by doing something like ipfs object stat /ipns/ipfs.io. This is why WebUI and IPFS Desktop put things on MFS directly, so that users can browse them more easily (and can give them names).

The file name is not “file metadata”. The file name is directory data (in IPFS and in the most common filesystems at least). IPFS is content addressed and therefore you get a Content Identifier when you add a file. If you add -w (wrap), you produce an additional content which is a directory, which can then include the file name (which is just a pointer to the CID of the file).

There are not supposed to be cycles, but items can have multiple parents (that’s how deduplication works when your file is in two different folders).

As said, it is not metadata, it’s another file (of type directory). But in the case of MFS, this is the default (everything in MFS is effectively inside folder).

This would require some side-indexing to keep track of who are the parents of who. Each content has a unique CID. You need to know that CID to build the parent (so that you can point to it). If you were to store the parent back with the original content, the CID for it would change. Currently IPFS does not track parents, although this is discussed (mostly in the context of efficient garbage collection).

Using ipfs add has less overhead than using files. In many use cases, you can just work with add and don’t use MFS at all. But as said, UIs have prioritized to show MFS. This is very confusing and there are plans to improve all the documentation around this and I hope it will prevent some of these headaches.

1 Like

Thanks for the answers; they helped a lot. I do have some follow up questions:

When items are added to the MFS they’re still “public”, right? Like it still creates a CID for the raw content of each file (a leaf node) which could be used to access it from anywhere by anyone, right? I assume there’s no way to make items “private” to specific machines or password protect items without just manually encrypting data yourself.

So using ipfs pin ls and ipfs refs local both just give a list of CIDs with without any information telling you what each item actually is (ipfs pin ls does also give the pin-type: direct, indirect, recursive); is there a convenient way to get more information about each item, along with its CID, like it’s IPLD-type or otherwise more Unix ls-like behavior like sorting directory nodes to show their child items in hierarchical structure?

There’s really no public interface for finding an item’s parents? Well, that’s unintuitive… I get now why a list of parents can be cached with an item, that would change the CID and create a dependency problem, but having a command for generating a momentary list of an item’s parents would still be quite helpful.

Glad to hear it.

Always public. ipfs files stat /path shows you the CID (I suspect UIs have a way to show this too).

Correct.

Nothing out of the box that I can think of. Things like IPFS Cluster allow attaching metadata to pins. There is also a proposal to be able to attach metadata to pinned items in IPFS (like name, etc), so this will happen sooner or later.

I guess you could get around this with ipfs refs -e -r <cid> if you run that for every pinned item, and then grep for the CIDs that you are interested in, you might find who are the parents.

1 Like