How can we efficiently remove unpinned objects from a large datastore?

Olga · September 22, 2023, 7:57pm

We are currently exploring the process of removing unpinned objects from our IPFS datastore.

While garbage collection (GC) is a potential solution, we face the challenge of dealing with a substantial amount of data, totaling over 70 TB. This makes the GC process time-consuming and not ideal for our needs.

An alternative approach we’ve considered is using the ‘block rm’ method to remove specific blocks. However, due to the sheer volume of our pinned files, it becomes impractical to efficiently check if a particular block is indirectly pinned.

Furthermore, both methods seem to interfere with the pinning process. If we can’t efficiently remove unpinned objects, the pinning process will be delayed until the removal process is finished.

Is there a way to effectively remove unpinned objects while still allowing the pinning of new files or minimizing the time required for removal?

Jorropo · September 22, 2023, 8:47pm

We know this is an issue and would like to fix it however we hadn’t time to rewrite the GC from a full flush to a refcount instead.

Either:

You commit a go enginer for a month or give or take (really more like 1 weeks to 3 months, depends on unknown unknowns). With a bit of help from us they should be able to get a refcount GC. (that means there will be 1 expensive migration which will rescan the data, however from that point on the GC will be incremental instead)
You spin up a new cluster and gradully migrate the data over, once the data is migrated you nuke the old node. This can be less efficient than running GC however as you noted running the GC locks the pinning state and prevents to add more pins, doing this migration doesn’t block the new nodes in the cluster .

There are probably other quick wins but I don’t think anything is gonna help you for 70TB. Refcounting sounds like the best (but biggest change).

Topic		Replies	Views
When is data unpinned? Kubo	4	507	March 16, 2022
IPFS remove file/object Help js-ipfs	4	2246	October 18, 2017
Space Usage on Private IPFS go-ipfs	6	232	August 3, 2023
After deleting files from ipfs local node it is still available on local node	2	757	December 12, 2020
When do files become available after unpinning Help	7	1192	March 18, 2019

How can we efficiently remove unpinned objects from a large datastore?

Related topics