IPFS nodes went unresponsive from time to time

Hi everyone, recently our IPFS nodes (both production and development environments) became unresponsive at times that we were even unable to SSH into those servers.

Each time nodes become unresponsive, it exhibits the same symptoms:
First, Disk I/O (reading) skyrocketed. At the same time, both Disk I/O (writing) and network I/O slumped. Finally, the IPFS process was killed by the OOM-Killer after certain time(several minutes - 2 days).

Below are logs from the moment ipfs became unresponsive and got killed:

log content

Mar 17 03:45:25 ip-10-0-0-143 80fea97ebfa0[252598]: 2023-03-17T03:45:25.350Z INFO net/identify identify/id.go:364 failed negotiate identify protocol with peer {“peer”: “12D3KooWPr5uTfnkWWKzQBUXniV7ow3cdMYMUxaZcgK4ZJVJQ89g”, “error”: “Application error 0x0: conn-314066: system: cannot reserve inbound connection: resource limit exceeded”}
Mar 17 03:45:25 ip-10-0-0-143 80fea97ebfa0[252598]: 2023-03-17T03:45:25.381Z INFO net/identify identify/id.go:364 failed negotiate identify protocol with peer {“peer”: “12D3KooWS8j1eJzqegfRGdtDMKKXHoNkhA7FfSj6wTKa69CjVQyw”, “error”: “Application error 0x0: conn-7284925: system: cannot reserve inbound connection: resource limit exceeded”}
Mar 17 03:45:25 ip-10-0-0-143 80fea97ebfa0[252598]: 2023-03-17T03:45:25.458Z INFO net/identify identify/id.go:364 failed negotiate identify protocol with peer {“peer”: “12D3KooWCfG7enT1cnuNPTNEd5tFJnzYk1M5bQpa4t1AvU4myYN3”, “error”: “Application error 0x0: conn-2175172: system: cannot reserve inbound connection: resource limit exceeded”}
Mar 17 03:45:25 ip-10-0-0-143 80fea97ebfa0[252598]: 2023-03-17T03:45:25.546Z INFO net/identify identify/id.go:364 failed negotiate identify protocol with peer {“peer”: “12D3KooWRKcULr3zRK64G9cSxzBihWPh3by54MYBhAkySddnfM9t”, “error”: “Application error 0x1: conn-53429118: system: cannot reserve inbound connection: resource limit exceeded”}
Mar 17 03:45:25 ip-10-0-0-143 80fea97ebfa0[252598]: 2023-03-17T03:45:25.604Z INFO net/identify identify/id.go:364 failed negotiate identify protocol with peer {“peer”: “12D3KooWNMCPjeDj6qT8Tc2YMX2sAsgceKmrCfnPkNfvpka9z41Q”, “error”: “Application error 0x0: conn-1175842: system: cannot reserve inbound connection: resource limit exceeded”}
Mar 17 03:45:25 ip-10-0-0-143 80fea97ebfa0[252598]: 2023-03-17T03:45:25.701Z INFO net/identify identify/id.go:364 failed negotiate identify protocol with peer {“peer”: “12D3KooWDbR8gvQBLgARDVav1iDoQRpu8CkV4a3GwnkWoLwxMQ2E”, “error”: “Application error 0x0: conn-3926499: system: cannot reserve inbound connection: resource limit exceeded”}
Mar 17 03:45:25 ip-10-0-0-143 80fea97ebfa0[252598]: 2023-03-17T03:45:25.709Z INFO net/identify identify/id.go:364 failed negotiate identify protocol with peer {“peer”: “12D3KooWAedrxMffxV6bajPHfqCtw6fnAFpGq6eWmXuCu3vGXJsz”, “error”: “Application error 0x0: conn-2545691: system: cannot reserve inbound connection: resource limit exceeded”}
Mar 17 03:45:26 ip-10-0-0-143 80fea97ebfa0[252598]: 2023-03-17T03:45:25.998Z INFO net/identify identify/id.go:364 failed negotiate identify protocol with peer {“peer”: “12D3KooWQUyCzybXKcy71ga1DsUwmeLnTiTpPM8g3qGtM3ygRhxx”, “error”: “timeout: no recent network activity”}
Mar 17 03:45:27 ip-10-0-0-143 80fea97ebfa0[252598]: 2023-03-17T03:45:25.449Z INFO net/identify identify/id.go:364 failed negotiate identify protocol with peer {“peer”: “12D3KooWFHj4dDxAUFDLYekGmEL2vPiDCE1u7zBqCDEJ6uJJk82w”, “error”: “Application error 0x0: conn-2321599: system: cannot reserve inbound connection: resource limit exceeded”}
Mar 17 03:45:27 ip-10-0-0-143 80fea97ebfa0[252598]: 2023-03-17T03:45:25.469Z INFO net/identify identify/id.go:364 failed negotiate identify protocol with peer {“peer”: “12D3KooWSEjo1c7XZmcQpyL38yevkyUJcUbnCFZF5jUZR2Jb4fsS”, “error”: “Application error 0x0: conn-899087: system: cannot reserve inbound connection: resource limit exceeded”}
Mar 17 03:45:27 ip-10-0-0-143 80fea97ebfa0[252598]: 2023-03-17T03:45:25.664Z INFO net/identify identify/id.go:364 failed negotiate identify protocol with peer {“peer”: “12D3KooWPKZSk8x35aQUx1t2nMGeEs1myNVyKetBdkgmT915WPjh”, “error”: “Application error 0x0: conn-4420740: system: cannot reserve inbound connection: resource limit exceeded”}
Mar 17 03:45:28 ip-10-0-0-143 80fea97ebfa0[252598]: 2023-03-17T03:45:28.174Z INFO net/identify identify/id.go:364 failed negotiate identify protocol with peer {“peer”: “12D3KooWSvbKgBrtEL4Razbg3jtA5DhNZ1mPRaRZZ1K97LwezEzj”, “error”: “Application error 0x0: conn-554140416: system: cannot reserve inbound connection: resource limit exceeded”}
Mar 17 03:45:28 ip-10-0-0-143 80fea97ebfa0[252598]: 2023-03-17T03:45:28.606Z INFO net/identify identify/id.go:364 failed negotiate identify protocol with peer {“peer”: “12D3KooWDr6FYqdVBiqX2Ee3zAJyrdezpb8pjp98FWCKLegzBB9w”, “error”: “Application error 0x0: conn-6396669: system: cannot reserve connection: resource limit exceeded”}
Mar 17 03:45:29 ip-10-0-0-143 80fea97ebfa0[252598]: 2023-03-17T03:45:27.824Z INFO net/identify identify/id.go:364 failed negotiate identify protocol with peer {“peer”: “12D3KooWRq5v31K86GD9bcS2tVZrDE2CcKbCYtTEisgcF1639soC”, “error”: “Application error 0x0: conn-1776380: system: cannot reserve inbound connection: resource limit exceeded”}
Mar 17 03:45:29 ip-10-0-0-143 80fea97ebfa0[252598]: 2023-03-17T03:45:29.644Z INFO net/identify identify/id.go:364 failed negotiate identify protocol with peer {“peer”: “12D3KooWNd8sNLQXrbDt1FpMAxC94732a11y92fDbvY8zYqV72pi”, “error”: “timeout: no recent network activity”}
Mar 17 03:45:29 ip-10-0-0-143 80fea97ebfa0[252598]: 2023-03-17T03:45:29.694Z INFO net/identify identify/id.go:364 failed negotiate identify protocol with peer {“peer”: “12D3KooWGqyhCbdLyonVdLgXMpLp5QRT1pAgMfXmBUbnTQbzsggx”, “error”: “timeout: no recent network activity”}
Mar 17 03:45:30 ip-10-0-0-143 80fea97ebfa0[252598]: 2023-03-17T03:45:27.806Z INFO net/identify identify/id.go:364 failed negotiate identify protocol with peer {“peer”: “12D3KooWGSmxipGeShuuqBLDgDYGNp2psLMtBqheRPVdvjG3GJ53”, “error”: “Application error 0x0: conn-6591542: system: cannot reserve inbound connection: resource limit exceeded”}
Mar 17 03:45:31 ip-10-0-0-143 80fea97ebfa0[252598]: 2023-03-17T03:45:31.138Z INFO bitswap go-bitswap@v0.7.0/bitswap.go:661 Bitswap ReceiveError: stream reset
Mar 17 03:45:33 ip-10-0-0-143 80fea97ebfa0[252598]: 2023-03-17T03:45:28.365Z INFO net/identify identify/id.go:364 failed negotiate identify protocol with peer {“peer”: “12D3KooWM95swYsEJCX8YwYr4M1VCBE7WxQcsiyKPfJaCEenZnCt”, “error”: “Application error 0x0: conn-290740: system: cannot reserve inbound connection: resource limit exceeded”}
Mar 17 03:57:26 ip-10-0-0-143 systemd[1]: snapd.service: Watchdog timeout (limit 5min)!
Mar 17 03:57:27 ip-10-0-0-143 systemd[1]: snapd.service: Killing process 249452 (snapd) with signal SIGABRT.
Mar 17 03:58:57 ip-10-0-0-143 systemd[1]: snapd.service: State ‘stop-watchdog’ timed out. Terminating.
Mar 17 03:59:21 ip-10-0-0-143 kernel: ipfs invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Mar 17 03:59:21 ip-10-0-0-143 kernel: CPU: 1 PID: 288278 Comm: ipfs Not tainted 5.15.0-1028-aws #32~20.04.1-Ubuntu
Mar 17 03:59:21 ip-10-0-0-143 kernel: Hardware name: Amazon EC2 t3a.medium/, BIOS 1.0 10/16/2017
Mar 17 03:59:21 ip-10-0-0-143 kernel: Call Trace:
Mar 17 03:59:21 ip-10-0-0-143 kernel:
Mar 17 03:59:21 ip-10-0-0-143 kernel: dump_stack_lvl+0x4a/0x63
Mar 17 03:59:21 ip-10-0-0-143 kernel: dump_stack+0x10/0x16
Mar 17 03:59:21 ip-10-0-0-143 kernel: dump_header+0x53/0x225
Mar 17 03:59:21 ip-10-0-0-143 kernel: oom_kill_process.cold+0xb/0x10
Mar 17 03:59:21 ip-10-0-0-143 kernel: out_of_memory+0x1dc/0x530
Mar 17 03:59:21 ip-10-0-0-143 kernel: __alloc_pages_slowpath.constprop.0+0xd32/0xe30
Mar 17 03:59:21 ip-10-0-0-143 kernel: ? __alloc_pages_slowpath.constprop.0+0xdb6/0xe30
Mar 17 03:59:21 ip-10-0-0-143 kernel: __alloc_pages+0x2cc/0x310
Mar 17 03:59:21 ip-10-0-0-143 kernel: alloc_pages+0x90/0x120
Mar 17 03:59:21 ip-10-0-0-143 kernel: __page_cache_alloc+0x87/0xc0
Mar 17 03:59:21 ip-10-0-0-143 kernel: pagecache_get_page+0x150/0x530
Mar 17 03:59:21 ip-10-0-0-143 kernel: ? page_cache_ra_unbounded+0x16a/0x220
Mar 17 03:59:21 ip-10-0-0-143 kernel: filemap_fault+0x527/0xb60
Mar 17 03:59:21 ip-10-0-0-143 kernel: ? filemap_map_pages+0x138/0x640
Mar 17 03:59:21 ip-10-0-0-143 kernel: __do_fault+0x40/0x120
Mar 17 03:59:21 ip-10-0-0-143 kernel: do_fault+0x1f9/0x420
Mar 17 03:59:21 ip-10-0-0-143 kernel: __handle_mm_fault+0x62c/0x840
Mar 17 03:59:21 ip-10-0-0-143 kernel: handle_mm_fault+0xd8/0x2c0
Mar 17 03:59:21 ip-10-0-0-143 kernel: do_user_addr_fault+0x1c2/0x660
Mar 17 03:59:21 ip-10-0-0-143 kernel: exc_page_fault+0x77/0x170
Mar 17 03:59:21 ip-10-0-0-143 kernel: asm_exc_page_fault+0x27/0x30
Mar 17 03:59:21 ip-10-0-0-143 kernel: RIP: 0033:0xdf0f00
Mar 17 03:59:21 ip-10-0-0-143 kernel: Code: Unable to access opcode bytes at RIP 0xdf0ed6.
Mar 17 03:59:21 ip-10-0-0-143 kernel: RSP: 002b:00007fc5b922add8 EFLAGS: 00010206
Mar 17 03:59:21 ip-10-0-0-143 kernel: RAX: 0000000000000000 RBX: 0000000000002710 RCX: 0000000000df0d7d
Mar 17 03:59:21 ip-10-0-0-143 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007fc5b922adc0
Mar 17 03:59:21 ip-10-0-0-143 kernel: RBP: 00007fc5b922ae38 R08: 000000002305e738 R09: 000000000004ecea
Mar 17 03:59:21 ip-10-0-0-143 kernel: R10: 0000000000000000 R11: 0000000000000212 R12: 00007fc5b922a800
Mar 17 03:59:21 ip-10-0-0-143 kernel: R13: 0000000000000178 R14: 000000c0000029c0 R15: 00007fc5b4341a9c
Mar 17 03:59:21 ip-10-0-0-143 kernel:
Mar 17 03:59:21 ip-10-0-0-143 kernel: Mem-Info:
Mar 17 03:59:21 ip-10-0-0-143 kernel: active_anon:577 inactive_anon:906218 isolated_anon:0
active_file:40 inactive_file:501 isolated_file:57
unevictable:5882 dirty:0 writeback:0
slab_reclaimable:14996 slab_unreclaimable:24939
mapped:2137 shmem:252 pagetables:4085 bounce:0
kernel_misc_reclaimable:0
free:19545 free_pcp:212 free_cma:0
Mar 17 03:59:21 ip-10-0-0-143 kernel: Node 0 active_anon:2308kB inactive_anon:3624872kB active_file:260kB inactive_file:1936kB unevictable:23528kB isolated(anon):0kB isolated(file):228kB mapped:8648kB dirty:0kB writeback:0kB shmem:1008kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 6144kB writeback_tmp:0kB kernel_stack:6112kB pagetables:16340kB all_unreclaimable? no
Mar 17 03:59:21 ip-10-0-0-143 kernel: Node 0 DMA free:14848kB min:260kB low:324kB high:388kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Mar 17 03:59:21 ip-10-0-0-143 kernel: lowmem_reserve: 0 2987 3847 3847 3847
Mar 17 03:59:21 ip-10-0-0-143 kernel: Node 0 DMA32 free:55228kB min:52264kB low:65328kB high:78392kB reserved_highatomic:0KB active_anon:1332kB inactive_anon:2890684kB active_file:128kB inactive_file:1692kB unevictable:0kB writepending:0kB present:3129256kB managed:3063720kB mlocked:0kB bounce:0kB free_pcp:1164kB local_pcp:644kB free_cma:0kB
Mar 17 03:59:21 ip-10-0-0-143 kernel: lowmem_reserve: 0 0 860 860 860
Mar 17 03:59:21 ip-10-0-0-143 kernel: Node 0 Normal free:7384kB min:15056kB low:18820kB high:22584kB reserved_highatomic:0KB active_anon:976kB inactive_anon:734188kB active_file:624kB inactive_file:188kB unevictable:23528kB writepending:0kB present:999424kB managed:889932kB mlocked:18592kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Mar 17 03:59:21 ip-10-0-0-143 kernel: lowmem_reserve: 0 0 0 0 0
Mar 17 03:59:21 ip-10-0-0-143 kernel: Node 0 DMA: 04kB 08kB 016kB 032kB 064kB 0128kB 0256kB 1512kB (U) 01024kB 12048kB (M) 34096kB (M) = 14848kB
Mar 17 03:59:21 ip-10-0-0-143 kernel: Node 0 DMA32: 430
4kB (UE) 23098kB (UME) 120016kB (UME) 30132kB (UME) 6264kB (UME) 10128kB (M) 5256kB (M) 0512kB 01024kB 02048kB 04096kB = 55552kB
Mar 17 03:59:21 ip-10-0-0-143 kernel: Node 0 Normal: 16764kB (UME) 598kB (UME) 916kB (M) 032kB 164kB (M) 0128kB 0256kB 0512kB 01024kB 02048kB 0*4096kB = 7384kB
Mar 17 03:59:21 ip-10-0-0-143 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Mar 17 03:59:21 ip-10-0-0-143 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Mar 17 03:59:21 ip-10-0-0-143 kernel: 4103 total pagecache pages
Mar 17 03:59:21 ip-10-0-0-143 kernel: 0 pages in swap cache
Mar 17 03:59:21 ip-10-0-0-143 kernel: Swap cache stats: add 0, delete 0, find 0/0
Mar 17 03:59:21 ip-10-0-0-143 kernel: Free swap = 0kB
Mar 17 03:59:21 ip-10-0-0-143 kernel: Total swap = 0kB
Mar 17 03:59:21 ip-10-0-0-143 kernel: 1036168 pages RAM
Mar 17 03:59:21 ip-10-0-0-143 kernel: 0 pages HighMem/MovableOnly
Mar 17 03:59:21 ip-10-0-0-143 kernel: 43915 pages reserved
Mar 17 03:59:21 ip-10-0-0-143 kernel: 0 pages hwpoisoned
Mar 17 03:59:21 ip-10-0-0-143 kernel: Tasks state (memory values in pages):
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 294] 0 294 70036 4488 86016 0 -1000 multipathd
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 465] 0 465 59701 1139 90112 0 0 accounts-daemon
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 466] 0 466 637 182 40960 0 0 acpid
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 472] 0 472 2137 559 53248 0 0 cron
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 474] 103 474 1986 816 57344 0 -900 dbus-daemon
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 488] 0 488 20458 677 61440 0 0 irqbalance
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 489] 0 489 7469 2854 94208 0 0 networkd-dispat
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 490] 0 490 58181 429 81920 0 0 polkitd
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 492] 0 492 344774 1300 188416 0 0 amazon-ssm-agen
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 497] 0 497 4607 1044 77824 0 0 systemd-logind
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 499] 0 499 98291 1295 131072 0 0 udisksd
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 500] 0 500 951 503 49152 0 0 atd
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 557] 0 557 1459 372 45056 0 0 agetty
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 561] 0 561 78765 837 106496 0 0 ModemManager
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 628] 0 628 27032 2698 106496 0 0 unattended-upgr
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 704] 0 704 3047 733 65536 0 -1000 sshd
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 713] 0 713 624 143 45056 0 0 bpfilter_umh
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 1243] 0 1243 347070 1900 200704 0 0 ssm-agent-worke
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 2035] 1000 2035 4754 1016 77824 0 0 systemd
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 2036] 1000 2036 26264 1129 94208 0 0 (sd-pam)
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 4440] 0 4440 6034 4759 86016 0 0 tmux: server
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 4454] 0 4454 2539 947 53248 0 0 bash
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 12705] 0 12705 2505 897 57344 0 0 bash
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 12730] 0 12730 2899 684 61440 0 0 sudo
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 12731] 0 12731 2531 568 61440 0 0 su
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 12732] 1000 12732 2934 1329 61440 0 0 bash
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 51568] 0 51568 2505 888 53248 0 0 bash
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 55310] 0 55310 2505 915 61440 0 0 bash
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 56295] 0 56295 2505 659 53248 0 0 bash
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 56307] 0 56307 122128 855 487424 0 0 journalctl
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 56308] 0 56308 2386 901 61440 0 0 pager
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 59004] 0 59004 2531 544 61440 0 0 su
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 59005] 1000 59005 2506 912 57344 0 0 bash
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 71889] 0 71889 2538 942 57344 0 0 bash
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 77273] 0 77273 2571 1000 57344 0 0 bash
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 78777] 0 78777 2636 1060 57344 0 0 bash
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 78790] 0 78790 2531 544 57344 0 0 su
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 78791] 1000 78791 2538 940 49152 0 0 bash
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 81879] 0 81879 135874 938 577536 0 0 journalctl
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 81880] 0 81880 3096 1730 65536 0 0 pager
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 82061] 0 82061 120775 265 581632 0 0 journalctl
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 82062] 0 82062 2384 912 61440 0 0 pager
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 130586] 0 130586 2676 627 53248 0 0 login
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 130659] 0 130659 4754 774 77824 0 0 systemd
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 130660] 0 130660 42711 1153 98304 0 0 (sd-pam)
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 130666] 0 130666 2506 867 65536 0 0 bash
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 148475] 0 148475 2393 757 65536 0 -1000 systemd-udevd
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 161263] 100 161263 6851 646 77824 0 0 systemd-network
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 161270] 101 161270 6222 1693 90112 0 0 systemd-resolve
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 161278] 0 161278 92558 1091 774144 0 -250 systemd-journal
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 161369] 102 161369 22722 906 77824 0 0 systemd-timesyn
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 242687] 0 242687 2531 569 53248 0 0 su
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 242688] 1000 242688 2906 1338 61440 0 0 bash
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 249452] 0 249452 218827 4119 245760 0 -900 snapd
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 249831] 0 249831 376070 3951 303104 0 -999 containerd
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 252598] 0 252598 384660 9317 421888 0 -500 dockerd
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 253328] 104 253328 56125 834 77824 0 0 rsyslogd
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 264679] 0 264679 2630 1041 57344 0 0 bash
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 274794] 0 274794 2538 938 57344 0 0 bash
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 288130] 0 288130 307652 461 147456 0 -500 docker-proxy
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 288136] 0 288136 289155 319 139264 0 -500 docker-proxy
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 288151] 0 288151 307652 350 151552 0 -500 docker-proxy
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 288158] 0 288158 289155 428 135168 0 -500 docker-proxy
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 288172] 0 288172 307748 485 172032 0 -500 docker-proxy
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 288179] 0 288179 289219 241 139264 0 -500 docker-proxy
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 288195] 0 288195 180125 752 114688 0 -998 containerd-shim
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 288217] 0 288217 289 12 32768 0 0 tini
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 288264] 1000 288264 1414979 844042 7340032 0 0 ipfs
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 288403] 997 288403 199806 3677 294912 0 0 amazon-cloudwat
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 289302] 0 289302 2505 902 57344 0 0 bash
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 341415] 0 341415 328152 679 176128 0 -998 runc
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 341426] 0 341426 272725 515 155648 0 0 runc:[2:INIT]
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 341432] 0 341432 3047 825 61440 0 0 sshd
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 341433] 0 341433 3047 756 65536 0 0 sshd
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 341434] 0 341434 3047 750 65536 0 0 sshd
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 341436] 0 341436 3047 645 61440 0 0 sshd
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 341437] 0 341437 3047 645 61440 0 0 sshd
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 341438] 0 341438 3054 737 61440 0 0 sshd
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 341441] 0 341441 3047 738 61440 0 0 sshd
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 341442] 0 341442 2994 601 61440 0 0 sshd
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 341443] 0 341443 3027 714 65536 0 0 sshd
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 341446] 0 341446 3054 692 61440 0 0 sshd
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 341447] 0 341447 3054 810 61440 0 0 sshd
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 341448] 0 341448 2994 635 61440 0 0 sshd
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 341450] 0 341450 2994 636 61440 0 0 sshd
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 341451] 0 341451 2968 284 61440 0 0 sshd
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 341452] 0 341452 2968 339 57344 0 0 sshd
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 341453] 0 341453 2968 314 57344 0 0 sshd
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 341454] 0 341454 2968 185 57344 0 0 sshd
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 341455] 0 341455 2968 195 65536 0 0 sshd
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 341456] 0 341456 2968 237 61440 0 0 sshd
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 341457] 0 341457 2968 172 61440 0 0 sshd
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 341458] 0 341458 2964 154 61440 0 0 sshd
Mar 17 03:59:21 ip-10-0-0-143 kernel: [ 341459] 0 341459 2276 76 49152 0 0 sshd
Mar 17 03:59:21 ip-10-0-0-143 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=80fea97ebfa0338a6da09e2204270af2a8e202a9118e0b90e052b5d672df3779,mems_allowed=0,global_oom,task_memcg=/docker/80fea97ebfa0338a6da09e2204270af2a8e202a9118e0b90e052b5d672df3779,task=ipfs,pid=288264,uid=1000
Mar 17 03:59:21 ip-10-0-0-143 kernel: Out of memory: Killed process 288264 (ipfs) total-vm:5659916kB, anon-rss:3376168kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:7168kB oom_score_adj:0
Mar 17 03:59:21 ip-10-0-0-143 dockerd[252598]: time=“2023-03-17T03:48:38.729209326Z” level=warning msg=“Health check for container 80fea97ebfa0338a6da09e2204270af2a8e202a9118e0b90e052b5d672df3779 error: timed out starting health check for container 80fea97ebfa0338a6da09e2204270af2a8e202a9118e0b90e052b5d672df3779”
Mar 17 03:59:21 ip-10-0-0-143 dockerd[252598]: time=“2023-03-17T03:59:21.901809596Z” level=error msg=“stream copy error: reading from a closed fifo”
Mar 17 03:59:21 ip-10-0-0-143 dockerd[252598]: time=“2023-03-17T03:59:21.941961555Z” level=error msg=“stream copy error: reading from a closed fifo”
Mar 17 03:59:21 ip-10-0-0-143 dockerd[252598]: time=“2023-03-17T03:59:21.970799602Z” level=info msg=“[core] [Channel #4 SubChannel #5] Subchannel Connectivity change to IDLE” module=grpc
Mar 17 03:59:21 ip-10-0-0-143 dockerd[252598]: time=“2023-03-17T03:59:21.978676294Z” level=error msg=“Failed to get event” error=“rpc error: code = Unavailable desc = error reading from server: EOF” module=libcontainerd namespace=plugins.moby
Mar 17 03:59:21 ip-10-0-0-143 dockerd[252598]: time=“2023-03-17T03:59:21.981010401Z” level=info msg=“Waiting for containerd to be ready to restart event processing” module=libcontainerd namespace=plugins.moby
Mar 17 03:59:21 ip-10-0-0-143 dockerd[252598]: time=“2023-03-17T03:59:21.982935254Z” level=info msg=“[core] blockingPicker: the picked transport is not ready, loop back to repick” module=grpc
Mar 17 03:59:22 ip-10-0-0-143 dockerd[252598]: time=“2023-03-17T03:59:21.983732773Z” level=info msg=“[core] [Channel #4] Channel Connectivity change to IDLE” module=grpc
Mar 17 03:59:22 ip-10-0-0-143 dockerd[252598]: time=“2023-03-17T03:59:22.010116372Z” level=info msg=“[core] [Channel #4 SubChannel #5] Subchannel Connectivity change to CONNECTING” module=grpc
Mar 17 03:59:22 ip-10-0-0-143 dockerd[252598]: time=“2023-03-17T03:59:22.010862400Z” level=info msg=“[core] [Channel #4] Channel Connectivity change to CONNECTING” module=grpc
Mar 17 03:59:22 ip-10-0-0-143 dockerd[252598]: time=“2023-03-17T03:59:22.014297280Z” level=info msg=“[core] [Channel #4 SubChannel #5] Subchannel picks a new address "/run/containerd/containerd.sock" to connect” module=grpc
Mar 17 03:59:22 ip-10-0-0-143 dockerd[252598]: time=“2023-03-17T03:59:22.069169151Z” level=info msg=“[core] [Channel #4 SubChannel #5] Subchannel Connectivity change to READY” module=grpc
Mar 17 03:59:22 ip-10-0-0-143 dockerd[252598]: time=“2023-03-17T03:59:22.069233992Z” level=info msg=“[core] [Channel #4] Channel Connectivity change to READY” module=grpc

Our IPFS nodes (using the docker image “ipfs/go-ipfs:v0.14.0”) are running on AWS EC2 instances (t3a.medium, gp2).

Any tips?

I fixed a few memory leaks since then you should consider upgrading.

Else you can try GOMEMLIMIT.

1 Like