Error hosting data on Rasberry Pi

I’m running an IPFS server on my Rasberry Pi. (more information on background in Feasibility for Self-Hosting Scientific Datasets?). I noticed that several days ago the service went offline.

First I checked: sudo systemctl status ipfs, which gives:

× ipfs.service - IPFS daemon
     Loaded: loaded (/etc/systemd/system/ipfs.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Fri 2024-02-16 02:14:56 UTC; 5 days ago
   Main PID: 967 (code=exited, status=2)
        CPU: 2d 11h 28min 38.869s

Feb 16 02:14:40 mojo ipfs[967]: goroutine 119468909 [semacquire]:
Feb 16 02:14:40 mojo ipfs[967]: runtime.gopark(0x1a29ecc, 0x2942450, 0x12, 0x19, 0x4)
Feb 16 02:14:40 mojo ipfs[967]:         runtime/proc.go:398 +0x104 fp=0x73835500 sp=0x738354ec pc=0x57900
Feb 16 02:14:40 mojo ipfs[967]: runtime.goparkunlock(...)
Feb 16 02:14:40 mojo ipfs[967]:         runtime/proc.go:404
Feb 16 02:14:40 mojo ipfs[967]: runtime.semacquire1(0x4031f9c, 0x0, 0x1, 0x0, 0x12)
Feb 16 02:14:40 mojo ipfs[967]:         runtime/sema.go:160 +0x270 fp=0x73835528 sp=0x73835500 pc=0x6c240
Feb 16 02:14:56 mojo systemd[1]: ipfs.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Feb 16 02:14:56 mojo systemd[1]: ipfs.service: Failed with result 'exit-code'.
Feb 16 02:14:56 mojo systemd[1]: ipfs.service: Consumed 2d 11h 28min 38.869s CPU time.

Looking at sudo journalctl -u ipfs.service

The head of the relevant log is:

-- Boot a862e97ecabd46cf9663871be3f18214 --
Feb 16 02:14:33 mojo ipfs[967]: runtime: out of memory: cannot allocate 4194304-byte block (3974135808 in use)
Feb 16 02:14:33 mojo ipfs[967]: fatal error: out of memory
Feb 16 02:14:33 mojo ipfs[967]: goroutine 119607122 [running]:
Feb 16 02:14:33 mojo ipfs[967]: runtime.throw({0x171682d, 0xd})
Feb 16 02:14:33 mojo ipfs[967]:         runtime/panic.go:1077 +0x4c fp=0xd2d3e640 sp=0xd2d3e62c pc=0x541b4
Feb 16 02:14:33 mojo ipfs[967]: runtime.(*mcache).refill(0xf7b53088, 0x78)
Feb 16 02:14:33 mojo ipfs[967]:         runtime/mcache.go:184 +0x244 fp=0xd2d3e670 sp=0xd2d3e640 pc=0x2944c
Feb 16 02:14:33 mojo ipfs[967]: runtime.(*mcache).nextFree(0xf7b53088, 0x78)
Feb 16 02:14:33 mojo ipfs[967]:         runtime/malloc.go:929 +0x84 fp=0xd2d3e694 sp=0xd2d3e670 pc=0x1f388
Feb 16 02:14:33 mojo ipfs[967]: runtime.mallocgc(0x4800, 0x16038b0, 0x1)
Feb 16 02:14:33 mojo ipfs[967]:         runtime/malloc.go:1116 +0x58c fp=0xd2d3e6cc sp=0xd2d3e694 pc=0x1fac8
Feb 16 02:14:33 mojo ipfs[967]: runtime.makechan(0x14738c0, 0x100)

and the tail is:

Feb 16 02:14:40 mojo ipfs[967]: goroutine 119468909 [semacquire]:
Feb 16 02:14:40 mojo ipfs[967]: runtime.gopark(0x1a29ecc, 0x2942450, 0x12, 0x19, 0x4)
Feb 16 02:14:40 mojo ipfs[967]:         runtime/proc.go:398 +0x104 fp=0x73835500 sp=0x738354ec pc=0x57900
Feb 16 02:14:40 mojo ipfs[967]: runtime.goparkunlock(...)
Feb 16 02:14:40 mojo ipfs[967]:         runtime/proc.go:404
Feb 16 02:14:40 mojo ipfs[967]: runtime.semacquire1(0x4031f9c, 0x0, 0x1, 0x0, 0x12)
Feb 16 02:14:40 mojo ipfs[967]:         runtime/sema.go:160 +0x270 fp=0x73835528 sp=0x73835500 pc=0x6c240
Feb 16 02:14:56 mojo systemd[1]: ipfs.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Feb 16 02:14:56 mojo systemd[1]: ipfs.service: Failed with result 'exit-code'.
Feb 16 02:14:56 mojo systemd[1]: ipfs.service: Consumed 2d 11h 28min 38.869s CPU time.

ipfs --version is ipfs version 0.26.0

IPFS config is:

{
  "API": {
    "HTTPHeaders": {
      "Access-Control-Allow-Methods": [
        "POST"
      ],
      "Access-Control-Allow-Origin": [
        "http://localhost:3000",
        "https://webui.ipfs.io",
        "http://127.0.0.1:5001"
      ]
    }
  },
  "Addresses": {
    "API": "/ip4/127.0.0.1/tcp/5001",
    "Announce": null,
    "AppendAnnounce": [
      "/ip4/172.100.113.212/tcp/4001",
      "/ip4/172.100.113.212/udp/4001/quic",
      "/ip4/172.100.113.212/udp/4001/quic-v1",
      "/ip4/172.100.113.212/udp/4001/quic-v1/webtransport"
    ],
    "Gateway": "/ip4/127.0.0.1/tcp/8080",
    "NoAnnounce": null,
    "Swarm": [
      "/ip4/0.0.0.0/tcp/4001",
      "/ip6/::/tcp/4001",
      "/ip4/0.0.0.0/udp/4001/quic-v1",
      "/ip4/0.0.0.0/udp/4001/quic-v1/webtransport",
      "/ip6/::/udp/4001/quic-v1",
      "/ip6/::/udp/4001/quic-v1/webtransport"
    ]
  },
  "AutoNAT": {
    "ServiceMode": "disabled"
  },
  "Bootstrap": [
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmcZf59bWwK5XFi76CZX8cbJ4BhTzzA3gU1ZjYZcYW3dwt",
    "/ip4/104.131.131.82/tcp/4001/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ",
    "/ip4/104.131.131.82/udp/4001/quic-v1/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmQCU2EcMqAqQPR2i9bChDtGNJchTbq5TbXJJ16u19uLTa"
  ],
  "DNS": {
    "Resolvers": {}
  },
  "Datastore": {
    "BloomFilterSize": 0,
    "GCPeriod": "1h",
    "HashOnRead": false,
    "Spec": {
      "mounts": [
        {
          "child": {
            "path": "blocks",
            "shardFunc": "/repo/flatfs/shard/v1/next-to-last/2",
            "sync": true,
            "type": "flatfs"
          },
          "mountpoint": "/blocks",
          "prefix": "flatfs.datastore",
          "type": "measure"
        },
        {
          "child": {
            "compression": "none",
            "path": "datastore",
            "type": "levelds"
          },
          "mountpoint": "/",
          "prefix": "leveldb.datastore",
          "type": "measure"
        }
      ],
      "type": "mount"
    },
    "StorageGCWatermark": 90,
    "StorageMax": "10GB"
  },
  "Discovery": {
    "MDNS": {
      "Enabled": true,
      "Interval": 10
    }
  },
  "Experimental": {
    "FilestoreEnabled": false,
    "GraphsyncEnabled": false,
    "Libp2pStreamMounting": false,
    "P2pHttpProxy": false,
    "StrategicProviding": false,
    "UrlstoreEnabled": false
  },
  "Gateway": {
    "APICommands": [],
    "HTTPHeaders": {},
    "NoDNSLink": false,
    "NoFetch": false,
    "PathPrefixes": [],
    "PublicGateways": null,
    "RootRedirect": "",
    "Writable": false
  },
  "Identity": {
    "PeerID": "12D3KooWMJxwdSsxYwyb6KCqHNpBcE2oM9HWz6yNkRHiavgQLsbr"
  },
  "Internal": {},
  "Ipns": {
    "RecordLifetime": "",
    "RepublishPeriod": "",
    "ResolveCacheSize": 128
  },
  "Migration": {
    "DownloadSources": [],
    "Keep": ""
  },
  "Mounts": {
    "FuseAllowOther": false,
    "IPFS": "/ipfs",
    "IPNS": "/ipns"
  },
  "Peering": {
    "Peers": null
  },
  "Pinning": {
    "RemoteServices": {}
  },
  "Plugins": {
    "Plugins": null
  },
  "Provider": {
    "Strategy": ""
  },
  "Pubsub": {
    "DisableSigning": false,
    "Router": ""
  },
  "Reprovider": {
    "Interval": "0",
    "Strategy": "all"
  },
  "Routing": {
    "AcceleratedDHTClient": true,
    "Type": "dhtclient"
  },
  "Swarm": {
    "AddrFilters": null,
    "ConnMgr": {
      "GracePeriod": "1m0s",
      "HighWater": 40,
      "LowWater": 20,
      "Type": "basic"
    },
    "DisableBandwidthMetrics": false,
    "DisableNatPortMap": false,
    "RelayClient": {},
    "RelayService": {},
    "Transports": {
      "Multiplexers": {},
      "Network": {},
      "Security": {}
    }
  }
}

OS Info:

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"

Is there anything obvious that I need to fix?

I hit the nuclear option. I installed a fresh OS and reconfigured everything from scratch.

I think perhaps the AcceleratedDHT was causing issues, but I’m not sure. I’m basing that off of this issue: Accelerated DHT Client causes OOM kill upon start of IPFS, ResourceMgr.MaxMemory ignored · Issue #9990 · ipfs/kubo · GitHub

My new node was configured to use the lowpower profile and all config options are defaulted except for AppendAnnounce, which seems to be important for checks on fleek to work.

    # https://docs.ipfs.tech/how-to/nat-configuration/#configuration-options
    #
    IPFS_PORT=4001
    WAN_IP_ADDRESS=$(curl ifconfig.me)
    echo "WAN_IP_ADDRESS = $WAN_IP_ADDRESS"

    echo "[
        \"/ip4/${WAN_IP_ADDRESS}/tcp/${IPFS_PORT}\",
        \"/ip4/${WAN_IP_ADDRESS}/udp/${IPFS_PORT}/quic\",
        \"/ip4/${WAN_IP_ADDRESS}/udp/${IPFS_PORT}/quic-v1\",
        \"/ip4/${WAN_IP_ADDRESS}/udp/${IPFS_PORT}/quic-v1/webtransport\",
    ]"
    echo "WAN_IP_ADDRESS = $WAN_IP_ADDRESS"
    ipfs config edit  # Manually add above lines
    ipfs config --json Addresses.AppendAnnounce

If there is a way to automatically configure this in a robust way, I’d be intersted in knowing. It seems weird that I have to specify my WAN IP manually. There must be a better way to do this.

In any case, I tested a unique pin and ipfs-check seems to work now:

CID: QmXzHAfQvnbb26t4nGB9f7NEcuM79AEYRXcCZ9v38bb8Mz
Multiaddr: /p2p/12D3KooWCFcfiBevjQD42aRAELMUZXAGScRiN2NcAthokF4dMnVU

https://ipfs-check.on.fleek.co/?cid=QmXzHAfQvnbb26t4nGB9f7NEcuM79AEYRXcCZ9v38bb8Mz&multiaddr=%2Fp2p%2F12D3KooWCFcfiBevjQD42aRAELMUZXAGScRiN2NcAthokF4dMnVU

✔ Successfully connected to multiaddr
✔ Found multiaddrs advertised in the DHT:
	/ip4/127.0.0.1/tcp/4001
	/ip4/172.100.113.212/tcp/4001
	/ip4/172.100.113.212/udp/4001/quic
	/ip4/192.168.222.18/tcp/4001
	/ip4/192.168.222.19/tcp/4001
	/ip6/::1/tcp/4001
✔ Found multihash advertised in the dht
✔ The peer responded that it has the CID

One thing I still don’t understand though is that when I test on:

with the same CID / multiaddress, I get:

  1. Is my content on the DHT? :heavy_check_mark: Success
  2. Is my peer in the DHT? :heavy_check_mark: Success
  3. Is my node accessible by other peers? :x: Failed no addresses
  4. Is my node serving the content? :heavy_check_mark: Success

For #2, it sometimes failed with routing: not found, but my most recent test worked.

For #3, it always fails if I use my /p2p/ address, but if I paste in an explicit ip4 address output by step 2: (e.g. /ip4/172.100.113.212/tcp/4001/p2p/12D3KooWCFcfiBevjQD42aRAELMUZXAGScRiN2NcAthokF4dMnVU) it does work.

I pinned this CID about 12+ hours ago, so I would expect it would have had time to propogate.

I don’t understand why step 3 is being so finicky. Why won’t it work when I give it my /p2p/ address?