Go-ipfs going out of memory

I am probably not the first one to experience this, but as we are offering public gateways I feel that I should report and open a discussion about this. The main reason is to express a feedback and we want to understand if we are alone or other people feel the same urge regarding this issue.

IPFS is going out of memory on our servers. We are going to increase again the memory of the nodes to mitigate the issue, but nothing is changing and it tends to eat memory like a crazy amoeba :laughing: This is happening on both external node exposing gateways and internal nodes providing services (under a super strict limited vnet).

I know everybody is busy working on new features and Filecoin. The plan is to fix this later AFAIK. I think that is time to focus on refactoring and optimising the code and memory/cpu usage. This is not stable yet, I am aware of that… anyway I think it is crazy dangerous that IPFS (and go-ipfs) is implementing more things that will require bigger maintenance later on! My case is that if we have more and more features instead of fixing the existing problems, later on the devs will have more and more headaches… especially when Filecoin will be there.

Is it just me thinking about this? I wish I could help some how more than just providing the logs :slight_smile:

From the latest logs from one of the private IPFS nodes:

Oct 15 13:23:36 ipfs: Killed
Oct 15 13:23:36 kernel: [114090.742155] ipfs invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=0, order=0, oom_score_adj=-998
Oct 15 13:23:36 kernel: [114090.742162] CPU: 0 PID: 18688 Comm: ipfs Not tainted 4.10.0-37-generic #41~16.04.1-Ubuntu
Oct 15 13:23:36 kernel: [114090.742163] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Oct 15 13:23:36 kernel: [114090.742164] Call Trace:
Oct 15 13:23:36 kernel: [114090.742174]  dump_stack+0x63/0x90
Oct 15 13:23:36 kernel: [114090.742178]  dump_header+0x7b/0x1fd
Oct 15 13:23:36 kernel: [114090.742183]  ? mem_cgroup_scan_tasks+0xc4/0xf0
Oct 15 13:23:36 kernel: [114090.742187]  oom_kill_process+0x219/0x3e0
Oct 15 13:23:36 kernel: [114090.742189]  out_of_memory+0x120/0x4b0
Oct 15 13:23:36 kernel: [114090.742190]  mem_cgroup_out_of_memory+0x4b/0x80
Oct 15 13:23:36 kernel: [114090.742192]  mem_cgroup_oom_synchronize+0x325/0x340
Oct 15 13:23:36 kernel: [114090.742193]  ? memory_high_write+0xe0/0xe0
Oct 15 13:23:36 kernel: [114090.742195]  pagefault_out_of_memory+0x36/0x80
Oct 15 13:23:36 kernel: [114090.742200]  mm_fault_error+0x8f/0x190
Oct 15 13:23:36 kernel: [114090.742201]  __do_page_fault+0x4b2/0x4e0
Oct 15 13:23:36 kernel: [114090.742206]  do_page_fault+0x22/0x30
Oct 15 13:23:36 kernel: [114090.742211]  page_fault+0x28/0x30
Oct 15 13:23:36 kernel: [114090.742212] RIP: 0033:0xdc0d63
Oct 15 13:23:36 kernel: [114090.742213] RSP: 002b:000000c4271cda48 EFLAGS: 00010246
Oct 15 13:23:36 kernel: [114090.742214] RAX: 000000c42ef36000 RBX: 000000c427a37420 RCX: 0000000000000200
Oct 15 13:23:36 kernel: [114090.742214] RDX: 0000000000000379 RSI: 0000000000000201 RDI: 000000c42ef37000
Oct 15 13:23:36 kernel: [114090.742215] RBP: 000000c4271cdb60 R08: 0000000000000000 R09: 000000c42a6baa40
Oct 15 13:23:36 kernel: [114090.742215] R10: 0000000000000007 R11: 000000c42a6baa88 R12: 0000000000000002
Oct 15 13:23:36 kernel: [114090.742216] R13: 000000c42a6baa58 R14: 00000000000000db R15: 000000c42a6baa40
Oct 15 13:23:36 kernel: [114090.742355] Memory cgroup out of memory: Kill process 13398 (ipfs) score 69 or sacrifice child
Oct 15 13:23:36 kernel: [114090.751266] Killed process 13398 (ipfs) total-vm:579084kB, anon-rss:257552kB, file-rss:19588kB, shmem-rss:0kB

Let me know if somebody wants more logs. This is happening almost every few hours now

I feel this was already raised in:

go-ipfs team is actively working on improving the situation and testing it internally:

Hopefully it will be addressed in next release.

1 Like

Thanks, I know. I would like to discuss and understand how we can remove the “Hopefully” from the last phrase and have these problems fixed soon :slight_smile: I would love to help more to achieve that…

Is anybody onboard with this?

The only reason this wouldn’t get fixed in the next release is if we decide to do an early release for some reason (in which case it would be fixed in the release after that) but that probably won’t happen. Regardless, you can expect to see this situation get much better in about a month (probably less)*.

*there are no givens in love and code; I don’t decide when the release gets cut.