No space left on device writing vlog file

Hi,

We are noticing quite a few errors in our ipfs-cluster logs. The repeat over and over. We have a three node ipfs cluster (kubernetes) and two of the nodes/pods have these errors. The system seems to be running fine we are able to save and retrieve data. The file is: /data/ipfs-cluster/badger/000002.vlog

ERROR bitswap go-bitswap@v0.7.0/bitswap.go:492 Error writing 1 blocks to datastore: Unable to write to value log file: … : no space left on device
2022-08-04 15:30:22.029 CDT2022-08-04T20:30:22.029Z ERROR badger badger@v1.6.2/logger.go:38 writeRequests: Unable to write to value log file: … : no space left on device
022-08-04 15:30:22.029 CDT2022-08-04T20:30:22.029Z ERROR badger badger@v1.6.2/logger.go:38 WatchBatch.Cancel error while finishing: Unable to write to value log file:…: no space left on device

your disk are full.

The badger blockstore does very poorly when the disk is full, I would recomand using flatfs for serious workloads.
Your data is probably corrupted.

Thanks. So far the data seems fine. The devops guys seem to think GC will keep the filesystem clean, but it appears to me that GC is struggling with this issue.

Are you actually running the GC ? It’s off by default, you need to do ipfs repo gc (one of run) or ipfs daemon --enable-gc.

GC has performance issues for now that why it’s not on by default.
ipfs-cluster can do a robin GC scheduling where it will run it on like 33% of your cluster at once or smth like that. It’s better but more of a workarround than anything.

Not sure how they are running it. I see these in the logs and assumed it was on:

2022-08-04T21:54:32.183Z INFO badger go-ds-badger@v0.3.0/datastore.go:441 Running GC round
2022-08-04T21:54:43.592Z INFO badger go-ds-badger@v0.3.0/datastore.go:443 Finished running GC round
2022-08-04T21:54:43.592Z ERROR badger go-ds-badger@v0.3.0/datastore.go:201 error during a GC cycle: Iteration function. Path=/data/ipfs-cluster/badger/000000.vlog. Error=Unable to write to value log file: “/data/ipfs-cluster/badger/000002.vlog”: write /data/ipfs-cluster/badger/000002.vlog: no space left on device

The GC need to shuffle a few blocks around maybe, but you literally have no space left so it can’t and fail.