Ipfs running on ECS fargate. Missing seed for some files

Hello everyone!

Ipfs running on ECS fargate.

Each file that has been added is available through our gateway. The file, shortly after being added to the network, is available through the public API-Gateway. After some time, probably about a week, the files cannot be access through the public gateway, but are still available through my gateway.

When I hit this endpoint, it returns api gateway timeout error

curl "https://ipfs.io/ipfs/QmTvn4Dmy3kgBkKH6HfwDXRCH3g8exyGMsLAYvkGuw3agH"

but, using my gateway it returns json file:

curl -X POST "https://ipfs.devel.original.works/api/v0/cat?arg=QmTvn4Dmy3kgBkKH6HfwDXRCH3g8exyGMsLAYvkGuw3agH"

I ran ipfs locally and ran the command ipfs dht findprovs. For files that are not accessible by the public api gateway returns an empty array. For files that were uploaded recently, it returns list of peers.

I got stuck in debuging. I need help or guidance.
Thank you!

Additional information:
List of local addresses:

curl -X POST  "https://ipfs.devel.original.works/api/v0/swarm/addrs/local" | jq
{
  "Strings": [
    "/ip4/127.0.0.1/tcp/4001",
    "/ip4/127.0.0.1/udp/4001/quic",
    "/ip4/172.31.48.13/tcp/4001",
    "/ip4/172.31.48.13/udp/4001/quic",
    "/ip4/52.91.154.217/tcp/4001",
    "/ip4/52.91.154.217/udp/4001/quic",
    "/ip6/::1/tcp/4001",
    "/ip6/::1/udp/4001/quic"
  ]
}

Port 4001 is open:

telnet 52.91.154.217 4001
Trying 52.91.154.217...
Connected to 52.91.154.217.

File is pinned:

curl -X POST "https://ipfs.devel.original.works/api/v0/pin/ls"
"QmTvn4Dmy3kgBkKH6HfwDXRCH3g8exyGMsLAYvkGuw3agH": {
  "Type": "recursive"
},

Everything looks good from here. The likely problem is that you are serving too many blocks for the default DHT client, so it cannot finish its reprovide runs under 12 hours (and likely takes days instead), so things fall out of the DHT. It’s easy to address, just turn on the accelerated DHT client, and restart your node. It will scan the network (about 10 mins), then do a full reprovide (which will take vastly less than 12 hours), and everything will be fine.

ipfs config --json Experimental.AcceleratedDHTClient true

1 Like

It works!

@ylempereur I appreciate that you took the time to respond quickly. Thank you!

Hi,
Still getting 504 error for some files. I’d say enabling accelerated dht helped 90% of files, but some of them don’t want to fetch the first time. The weird thing is that when I try to reload the page several times, I can get it eventually. I tested the CID on the IPFS check page and got this error:

I have changed the reprovide rate from 12h to 8h on my own node for similar reasons, try that.

1 Like

While debugging my node, I noticed that I had wrongly found the source of the problem. While browsing the metrics, we found a memory usage graph. The memory consumption started to increase and when it reached the limit, the node restarted.


When the memory limit was increased to 2GB from 1GB, the interval only increased. The same applies to increasing the limit to 4GB.

My next step was to change LowWater to 200 and highWater to 300. That didn’t change anything. At some point, I turned off accelerated dht. The change in the chart is interesting but hasn’t changed anything.
Before:

After:

What could be causing this node behavior? Is 4gb still not enough memory? What else can I change in the settings to reduce memory consumption?

ipfs config profile apply lowpower

This will disable the DHTServer (helping other nodes find content) and the reprovider (storing in the DHT which files you host) as well as a few other things.

After a few tries, I managed to get a stable instance.


My settings now look like this:

"Experimental": {
    "AcceleratedDHTClient": true,
    "FilestoreEnabled": true,
    "GraphsyncEnabled": false,
    "Libp2pStreamMounting": true,
    "P2pHttpProxy": true,
    "StrategicProviding": false,
    "UrlstoreEnabled": true
  },
  "Reprovider": {
    "Interval": "8h",
    "Strategy": "all"
  },
  "Routing": {
    "Type": "dhtclient"
  },
  "Swarm": {
    "AddrFilters": null,
    "ConnMgr": {
      "GracePeriod": "1m0s",
      "HighWater": 40,
      "LowWater": 20,
      "Type": "basic"
    },
    "DisableBandwidthMetrics": false,
    "DisableNatPortMap": false,
    "RelayClient": {},
    "RelayService": {},
    "ResourceMgr": {},
    "Transports": {
      "Multiplexers": {},
      "Network": {},
      "Security": {}
    }
  }

Unfortunately, the files are not accessible through the public API gateway. Is 4gb of ram not too much for a node?