MFS implem questions

Hi,

I am struggling to understand MFS, I have 3 questions :

  • Where are stored MFS content ? I read in the doc that MFS content is not stored in the same place than classic content (when deleting ipfs content, mfs content is preserved). Is is still in datastore ?
  • How to obtain the CID of mfs content ?
  • Who and why use MFS ? I understand the idea to manipulate ipfs content just like unix file with readable names however I saw no projects which use it. Does it really bring something new or is it just for basic utilisation ?

MFS blocks are stored in the blockstore.
And the MFS root (your first block) CID is stored in the datastore.

With the default config blocks are stored in flatfs and the root is stored in leveldb.

With badger, both are stored in badger.

ipfs files stat /

People who want an easy way to mutate files.
I know projects using it it’s not new, it has big performance issues tho.

1 Like

I run /ipns/void-mirror.the-brannons.com, an IPFS mirror of the Void
Linux package repository. It’s a large dataset: over 1.1 TB in more
than 360000 files. Several of the directories contain tens of thousands
of files.

At first, I naively used MFS to manage it, but performance wasn’t
acceptable. Now I keep a database mapping filenames to CIDs.
Every time I rsync from Void’s mirrors, I update the database, add
missing content, and then reconstruct the directory structure manually.

I build my unixfs directory nodes as huge single blocks. Then I copy
them to MFS and force sharding by a write followed by an rm. That way I
only do one mutation per directory.

1 Like

Thank you @Jorropo for your explanation and @teiresias for your example :slight_smile:

Sounds like, “MFS has performance issues so I had to write my own MFS”. I like IPFS but it’s somewhat disappointing how often this problem comes up in different areas. It would be helpful if someone could post some numbers to quantify “it has big performance issues”. Someone is going to come across this post and say, “Can’t rely on MFS, it has performance issues and repeat that in other places” and then it just goes on the pile along with IPNS and badgerfs.

1 Like

I’ll add that it solves a big problem that without it your IPFS store will become a data junk drawer. It’s like one gigantic memory leak. If you add something to your store you’d better remember that CID or else it’s in there forever. It’s a huge haystack of CIDs. Sure you could list pinned CIDs and maybe with some work you can locate the root ones but what then? Who knows what it is. Maybe it’s important. Now you’re stuck with either an ever growing collection of random but forgotten stuff eating up your hard drive or you you just periodically clean out that junk drawer but what good is something where you have to periodically wipe it clean? You’d have to repopulate it from somewhere so now you’re maintaining a second copy.

1 Like

FYI I did this exact same thing with GitHub - Jorropo/linux2ipfs: Small pipeline and extreme-performance oriented IPFS implementation to upload files and deltas to pinning services very fast..

MFS is slow and it’s complicated to fix.
It probably need a complete rewrite.
Right now the way it works is that it doesn’t have any internal structure (except the root CID key) and then everything is done by traversing the DAG, parsing the blocks, modifying the dag-pb unixfs (which isn’t efficient to modify at all because it’s optimized for P2P reads) and then reserialising it.

It’s like if your spreadsheet software worked by parsing HTML inside a PDF file, then modify a single cell then export back to PDF.
And if you do multiple edits, all of them has to load, modify and export.

1 Like