We have a go-ipfs node where new data (file objects as well as IPLD data) gets pinned every few minutes. Our data pinned to the node grows at the rate of 15GB per day. We would like to keep archiving data older than 30 days and upload it to filecoin.
The problem is with the GC as in whenever it runs, it blocks all IPFS/IPLD add/dag-put operations. I have seen there are open issues on the same [META] Garbage Collection Enhancement/Rework 路 Issue #7752 路 ipfs/go-ipfs 路 GitHub. We have had to disable auto-gc for the same reason.
It would be great if there is any work-around or an alternate solution to achieve the same (i.e backup to a CAR file and mainly remove archived entries from IPFS even when new data is getting pinned).
Can I use ipfs-cluster to address this problem someway (where-in during archival and cleanup of old data, new pin transactions are not blocked)?
Our ipfs node version and we allocate ~250GB of disk-space for it.
go-ipfs version: 0.14.0-dev-5615715
Repo version: 12
System version: arm64/linux
Golang version: go1.18.1
See ipfs dag export --help
.
You might need to checkout other tools to split them in 32GiB archives.
Thanks @Jorropo , but this command will only export data on the node into a CAR file.
We would still have to unpin all the CIDs exported and do a manual GC on the IPFS node right?
Then it would block all running operations i.e new adds/dag-put running.
Would want to know if there is any way to handle that.
Just unpin them and do ipfs repo gc
once the data is safe on filecoin, I don鈥檛 really see any problem with that.
@chaitanyaprem are you using your IPFS node for transiant storage ?
Because if then when we write thoses kinds of things, we rarely use an ipfs node, instead we use the underlying libs directly and write them in a streaming / pipelined way,
Yes, that is what we were planning to do. (unpin on ipfs and do repo gc once data is safe on filecoin)
But, the problem occurs when we run repo gc.
First of all it takes a lot of time if there are many objects to be gc鈥檇鈥or ex when there were close to 5 million objects GC ran for 30 mins after which we had to kill it and none of the unpinned objects were cleaned-up.
Secondly, during the time of GC run any new pin operations (ipfs add/ipfs dag put) done via RPC API get blocked and timeout.
We are using IPFS node as HOT_STORAGE layer to serve the most recent data and plan to migrate the data older than 30 days as archival to filecoin.
Hence, i wanted to see if there is any way where RPC operations (add/dag-put) towards ipfs continue even during GC (either by using ipfs-cluster) in some form.