Memory leaks in 0.24

noisekit · December 9, 2023, 10:10pm

Since I have upgraded from 0.20 to 0.24 it looks like memory leaking increased dramatically. We used to have it before but server did hold up for at least a few months. After upgrade about a week ago (no additional config changes), ipfs fills up all the memory in like 3-4 days now.

I haven’t posted any issued in GitHub, but want to ask if anyone else observes similar behaviour and maybe there are some server options that might help eliminating the issue

Here is mem usage chart (roughly around beginning of December we’ve upgraded to 0.24)

Jorropo · December 10, 2023, 2:53am

I’m not aware of anything, please go ahead open on github and you post ipfs diag profile when it at the highest.

noisekit · December 10, 2023, 4:14am

After a restart few hours ago, I’ve added a chart for ipfs daemon specifically. Mem usage keeps climbing up at a decent pace (the number is %, machine has 8G ram)

~~Diag profile is 40.7 MB, is this normal?~~ Ok I see the whole ipfs binary is included in there, so that’s unzipped 90Mb

I’ll see if I can catch the state with max mem usage that I can still ssh into the box to get diag info out.

Jorropo · December 10, 2023, 4:46am

Yeah we need the binary because the profile contains instruction pointer stack traces information.
So then by following the instruction pointers to the elf we can get line number debug info and figure out the function names.

For official binaries from dist.ipfs.tech we could omit this TBH, would need a script which pulls out the same version from dist.ipfs.tech from the version file.

thx

noisekit · December 10, 2023, 4:50am

I’ve updated from there, yep, probably when created diag can run checksum over and if matched known - skip inclusion.

I’ll leave it running for another day and get the diag snapshot

Jorropo · December 10, 2023, 4:54am

Ok but … how do you include the checksum of the binary inside the binary without also changing the checksum of the binary ?

I guess we could skip the checksum when checksuming the binary but then it’s not as easy as just take the input file and hash it, need to do symbol resolution to know the couple of bytes we must not hash.

I was thinking to have a special -tags ipfs/distribution which blindly assumes this is ok to do.

noisekit · December 11, 2023, 5:01am

The charts are looking creepy today

I cannot attach files here so I pinned the report

added QmWAfBzgpyUywTwP28MuRA78GCC12heQGLT58UAn9N9YBg ipfs-profile-2023-12-11T03_46_33Z_00.zip

So this should work

ipfs cat QmWAfBzgpyUywTwP28MuRA78GCC12heQGLT58UAn9N9YBg > ipfs-profile-2023-12-11T03_46_33Z_00.zip

Jorropo · December 11, 2023, 7:54am

Interesting, to me it seems like you have many thousands connections open.
What does ipfs swarm peers | wc -l reports ?

noisekit · December 11, 2023, 7:59am

ipfs swarm peers | wc -l
459

at this moment (I did have to restart ipfs service as it was too close to eat all the memory)

Jorropo · December 11, 2023, 8:19am

Can you capture an other profile with number of peers when it’s using a lot again please ?
The profile shows that quic is holding on to lots of connection objects, I would like to know if it is because we have lots of connections open or because we don’t properly clean up dead connections.

noisekit · December 11, 2023, 8:25am

Sure I will add a metric to the chart to track peers. And will see how those charts are correllated

Considering our usage patterns have not changed at all over past many months and the only change that happened a week ago was upgrade to 0.24 that leaves me with two realistic options:

some code issue with 0.24 upgrade
some old config options that we haven’t changed that lead to this problem

noisekit · December 11, 2023, 11:37am

That’s interesting. I have checked peers recently and it was around 4.5-4.7k
I did run command ipfs swarm peers few times and number is now went down to 1.5k (as well as memory!) without ipfs restarts.

Now we have charts to monitor mem usage and peer count, so I can get info over time as more data accumulated.

(Would be interesting if the peers get cleaned up by calling ipfs swarm peers somehow…)

Jorropo · December 11, 2023, 11:39am

Can you post ipfs config show too please ? (it excludes private keys)

noisekit · December 11, 2023, 11:49am

With some of the values/ips/peer ids cleaned up:

ipfs config show

{

  "API": {
    "HTTPHeaders": {
      "Access-Control-Allow-Credentials": [
        "true"
      ],
      "Access-Control-Allow-Headers": [
        "Authorization,Accept,Origin,DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Content-Range,Range"
      ],
      "Access-Control-Allow-Methods": [
        "GET,POST,OPTIONS,PUT,DELETE,PATCH"
      ],
      "Access-Control-Allow-Origin": [
        "*"
      ]
    }
  },
  "Addresses": {
    "API": "/ip4/127.0.0.1/tcp/5002",
    "Announce": null,
    "AppendAnnounce": null,
    "Gateway": "/ip4/127.0.0.1/tcp/8080",
    "NoAnnounce": [
      "CLEANED UP"
    ],
    "Swarm": [
      "/ip4/0.0.0.0/tcp/4001",
      "/ip6/::/tcp/4001",
      "/ip4/0.0.0.0/udp/4001/quic-v1",
      "/ip4/0.0.0.0/udp/4001/quic-v1/webtransport",
      "/ip6/::/udp/4001/quic-v1",
      "/ip6/::/udp/4001/quic-v1/webtransport"
    ]
  },
  "AutoNAT": {},
  "Bootstrap": [
    "/dnsaddr/bootstrap.libp2p.io/p2p/Qm******",
    "/dnsaddr/bootstrap.libp2p.io/p2p/Qm******",
    "/dnsaddr/bootstrap.libp2p.io/p2p/Qm******",
    "/dnsaddr/bootstrap.libp2p.io/p2p/Qm******",
    "/ip4/x.x.x.x/tcp/4001/p2p/Qm******",
    "/ip4/x.x.x.x/udp/4001/quic-v1/p2p/Qm******"
  ],
  "DNS": {
    "Resolvers": {}
  },
  "Datastore": {
    "BloomFilterSize": 0,
    "GCPeriod": "1h",
    "HashOnRead": false,
    "Spec": {
      "mounts": [
        {
          "child": {
            "path": "blocks",
            "shardFunc": "/repo/flatfs/shard/v1/next-to-last/2",
            "sync": true,
            "type": "flatfs"
          },
          "mountpoint": "/blocks",
          "prefix": "flatfs.datastore",
          "type": "measure"
        },
        {
          "child": {
            "compression": "none",
            "path": "datastore",
            "type": "levelds"
          },
          "mountpoint": "/",
          "prefix": "leveldb.datastore",
          "type": "measure"
        }
      ],
      "type": "mount"
    },
    "StorageGCWatermark": 90,
    "StorageMax": "10GB"
  },
  "Discovery": {
    "MDNS": {
      "Enabled": false
    }
  },
  "Experimental": {
    "FilestoreEnabled": false,
    "GraphsyncEnabled": false,
    "Libp2pStreamMounting": false,
    "P2pHttpProxy": false,
    "StrategicProviding": false,
    "UrlstoreEnabled": false
  },
  "Gateway": {
    "APICommands": [],
    "HTTPHeaders": {
      "Access-Control-Allow-Credentials": [
        "true"
      ],
      "Access-Control-Allow-Headers": [
        "Authorization,Accept,Origin,DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Content-Range,Range"
      ],
      "Access-Control-Allow-Methods": [
        "GET,POST,OPTIONS,PUT,DELETE,PATCH"
      ]
    },
    "NoDNSLink": false,
    "NoFetch": true,
    "PathPrefixes": [],
    "PublicGateways": null,
    "RootRedirect": "",
    "Writable": false
  },
  "Identity": {
    "PeerID": "CLEANED UP"
  },
  "Internal": {},
  "Ipns": {
    "RecordLifetime": "",
    "RepublishPeriod": "",
    "ResolveCacheSize": 128,
    "UsePubsub": true
  },
  "Migration": {
    "DownloadSources": [],
    "Keep": ""
  },
  "Mounts": {
    "FuseAllowOther": false,
    "IPFS": "/ipfs",
    "IPNS": "/ipns"
  },
  "Peering": {
    "Peers": null
  },
  "Pinning": {
    "RemoteServices": {}
  },
  "Plugins": {
    "Plugins": null
  },
  "Provider": {
    "Strategy": ""
  },
  "Pubsub": {
    "DisableSigning": false,
    "Router": ""
  },
  "Reprovider": {},
  "Routing": {
    "AcceleratedDHTClient": true,
    "Methods": null,
    "Routers": null
  },
  "Swarm": {
    "AddrFilters": [
      "CLEANED UP"
    ],
    "ConnMgr": {},
    "DisableBandwidthMetrics": false,
    "DisableNatPortMap": true,
    "RelayClient": {},
    "RelayService": {},
    "ResourceMgr": {},
    "Transports": {
      "Multiplexers": {},
      "Network": {},
      "Security": {}
    }
  }
}

noisekit · December 12, 2023, 12:41am

around 12h of charts now. Looks like just by constantly running ipfs swarm peers (as zabbix does to read values) the number of peers no longer hangs at high mark and gets cleaned up.

Jorropo · December 12, 2023, 11:01am

This is the accelerated dht client hourly crawl.
Similar to this Accelerated DHT Client causes OOM kill upon start of IPFS, ResourceMgr.MaxMemory ignored · Issue #9990 · ipfs/kubo · GitHub

noisekit · December 12, 2023, 11:27am

Thanks.
Today at some point memory just jumped up and stayed up there, no visible difference in peers count though.
Next jump like that will brick the container.

I’ve run diag again

added QmYpzYJ7PkS2aWre2qCnj6kUSfkE54vcDAqydBERY4ySEP ipfs-profile-2023-12-12T11_29_14Z_00.zip

download:

 ipfs cat QmYpzYJ7PkS2aWre2qCnj6kUSfkE54vcDAqydBERY4ySEP > ipfs-profile-2023-12-12T11_29_14Z_00.zip

noisekit · December 27, 2023, 1:23pm

I haven’t found better solution than just restart ipfs daemon from time to time
The longest it lasted was about 6-7 days until ec2 instance became unresponsive and needed a reboot.

noisekit · March 17, 2024, 3:04am

I have recently updated to 0.27 with hopes that this issue would go away. But it became worse.
Filling up to 80% mem in just 3 days now.

noisekit · March 17, 2024, 3:07am

On a 30d scale. For comparison all previous were 0.24, last two blocks are 0.27

Topic		Replies	Views
IPFS going out of memory Help	11	1842	August 30, 2017
Go-ipfs 0.14 - No more memory leaks go-ipfs	2	266	October 13, 2022
Go-ipfs going out of memory Ecosystem and Usage go-ipfs	3	1242	October 16, 2017
IPFS Memory max out crash - Resolved Help	1	764	January 19, 2018
IPFS System Requirements? Help	4	4508	November 22, 2017

Memory leaks in 0.24

Related topics