Node using 40Gb ram and 16 cores, still OOMing

My team and I are running the ipfs/go-ipfs:v0.25.0 image in kubernetes in a single pod. We’ve found that lately the node is using huge amounts of resources and we’re not quite sure how to debug. We currently have 40Gb of memory and 16 cores allocated to it. The pod OOMs every ~2 days.

Here are what I think are the most relevant metrics, happy to pull any others. The red at the top of each graph is an OOM event, darker red means more OOMs in that particular metric interval.

Oh, looks like I can’t put multiple images in this post. Will reply to this with images from our dashboard.

Would recommend starting with kubo/docs/debug-guide.md at master · ipfs/kubo · GitHub and in particular getting a profile dump while the memory usage is high. This will let you use Go’s go tool proof -http:1234 on the heap file and see where all the memory is going.

2 Likes

Node is currently using 13Gi and the heap pprof is showing the following as the worst offender

Seeing this in the CPU profile

I had a memory leak with 0.27 standalone and upgraded to 0.29. It may have solved the issue.

Hey Friends!

I’ll chip in with the same issues. Upgraded to 0.29 a few days ago and increased memory from 4G to 16G.

Please check the Grafana RAM chart, the thin white lines are where kubo getting OOM killed by the kernel and restarts.
It looks like a memory leak to me.

Sincerely,
Michael

1 Like

Turns out doing the following fixed things

/ # ipfs config --json Gateway.PublicGateways '{ "gateway.pinata.cloud": { "PathPrefixes": ["/ipfs", "/ipns"], "UseSubdomains": false } }'

I tried to apply this setting as well, but it had not effect whatsoever…

After upgrading to 0.30 it seems the problem is gone. Or almost gone.