Node using 40Gb ram and 16 cores, still OOMing

paymahn · July 2, 2024, 8:45pm

My team and I are running the ipfs/go-ipfs:v0.25.0 image in kubernetes in a single pod. We’ve found that lately the node is using huge amounts of resources and we’re not quite sure how to debug. We currently have 40Gb of memory and 16 cores allocated to it. The pod OOMs every ~2 days.

Here are what I think are the most relevant metrics, happy to pull any others. The red at the top of each graph is an OOM event, darker red means more OOMs in that particular metric interval.

Oh, looks like I can’t put multiple images in this post. Will reply to this with images from our dashboard.

paymahn · July 2, 2024, 8:46pm

paymahn · July 2, 2024, 8:47pm

adin · July 2, 2024, 8:55pm

Would recommend starting with kubo/docs/debug-guide.md at master · ipfs/kubo · GitHub and in particular getting a profile dump while the memory usage is high. This will let you use Go’s go tool proof -http:1234 on the heap file and see where all the memory is going.

paymahn · July 2, 2024, 9:07pm

Node is currently using 13Gi and the heap pprof is showing the following as the worst offender

paymahn · July 2, 2024, 9:09pm

Seeing this in the CPU profile

brianebert · July 4, 2024, 11:05pm

I had a memory leak with 0.27 standalone and upgraded to 0.29. It may have solved the issue.

michaelr · July 7, 2024, 5:48am

Hey Friends!

I’ll chip in with the same issues. Upgraded to 0.29 a few days ago and increased memory from 4G to 16G.

Please check the Grafana RAM chart, the thin white lines are where kubo getting OOM killed by the kernel and restarts.
It looks like a memory leak to me.

Sincerely,
Michael

paymahn · July 10, 2024, 3:42pm

Turns out doing the following fixed things

/ # ipfs config --json Gateway.PublicGateways '{ "gateway.pinata.cloud": { "PathPrefixes": ["/ipfs", "/ipns"], "UseSubdomains": false } }'

michaelr · July 12, 2024, 4:45am

I tried to apply this setting as well, but it had not effect whatsoever…

michaelr · September 24, 2024, 4:18pm

After upgrading to 0.30 it seems the problem is gone. Or almost gone.

Topic		Replies	Views
Excessive/Expected IPFS memory, threads, CPU? - private deployment Kubo	1	1274	January 21, 2021
Go-ipfs going out of memory Ecosystem and Usage go-ipfs	3	1242	October 16, 2017
Constant 100% CPU utilization on AWS EC2 IPFS node Help go-ipfs	14	955	November 22, 2023
Kubo v0.33.0-rc1 is out! News go-ipfs , kubo	7	85	January 23, 2025
Go-ipfs badgerds memory usage and migrating back to file-based store Kubo go-ipfs	3	856	March 27, 2019

Node using 40Gb ram and 16 cores, still OOMing

Related topics