Why so many peers needs to constantly be connected?

Hi everyone,

Woke up this morning asking myself why a full node needs so many connected peers (constantly between 600 and 900) while doing nothing (no content hosted/downloaded)?

I know that Golang can’t get restricted in ressources usage, which is normal, and I don’t think the problem is coming from that.

[trolling]
When I start a computer and browse the Internet, the network stack is not using that much ressources compared to running an IPFS full node (up to 90% of CPU, 70% of RAM etc).
[/trolling]

From a naive design point of view, I would expect connections to be limited to:

  1. Closest peers, part of the same bucket, as described in the Kademlia DHT routing table principle
  2. Peers querying the DHT for a ressource my bucket/node should be aware of (some sort of content hash based spliting to restrict query to a part of the network)
  3. Peers messaging to update ressource list my bucket/node is responsible of (can be huge but effemeral…and in a pub/sub pattern shouldn’t need a direct connection)
  4. Peers I’m downloading/uploading data from/to (which is effemeral too)

I may need a real strong coffee to reset my mind or am in a “dumb-day”, but can someone explain me where I’m wrong ?
Does the huge number of connected peers only comes from some DHT flooding due too many nodes getting in and out of the network ? In that case, isn’t there a technical design problem that needs a quick refactoring and are there ideas being explored already?

Why is between 600 and 900 peers always connected

The ipfs connection manager defaults to a low-mark of 600 and a high mark of 900 peers, so it will always try to have 600 peers and will start pruning peers at 900.

You can either apply the “lowpower” profile or manually drop these settings in the ipfs config file to reduce ipfs compute and network requirements at the expense of content discovery performance.

Why are so many connections necessary

IPFS has more than one content discovery methods other than DHT with some only being usable on already connected peers, most notably bitswap sessions. Additionally connecting to a peer is an expensive operation compared to just maintaining a keepalive for hours.

I am not sure how many connections are required to maintain the DHT but most of the connections are being kept open for bitswap performance reasons and not maintaining the DHT.

With that said IMHO 600 peers at idle is too many with the current content discovery systems implemented. The background resource usage is an issue most new users bring up.

As for solutions I probed around for ideas a bit back with minor success, never got around to writing any proof of concept code to test any of it though.

Most of the ideas would probably end up being either too computationally expensive compared to what was saved or end up too difficult to implement but I am still interested in what rklaehn brought up with utilizing bloom filters, this would potentially cull a huge number of wantlist requests to peers that dont have what you are looking for.

Thanks for the reply.

Forgot about the bitswap sessions…(dumb I said)…I’m too DHT focused

Do you know if there’s an option to disable bitswap sessions?
Something like the “–routing=dhtclient” option would be great (I wouldn’t need to go back to a really old IPFS release) as I’m trying to work around the DHT problem basing on some research like the following:

Sub-Second Lookups on a Large-Scale Kademlia-Based Overlay

Thanks.