I lack some understanding how ipfs-cluster works in a private setup, if I have e.g. 2 nodes, one reachable on the internet and one running on localhost. In that case only outbound connections from localhost to the public cluster server would work, not the other way round.
I assume that this is the reason for lots of error messages like, or could it be something different?
2025-10-23T17:11:02.008Z INFO cluster ipfs-cluster/cluster.go:638 reconnected to 12D3KooW…6RCv
2025-10-23T17:11:07.996Z ERROR cluster ipfs-cluster/cluster.go:2124 12D3Ko..1HcHKi6sp8957a: error in broadcast response from 12D3K..9GY3Bixvh6RCv: failed to open stream: context deadline exceeded
The “client” peer (so localhost) is producing these. And the pins also sync, so the cluster even seems working. I can’t imagine there’s anything dropping it, it’s just plain internet and no firewalls. It’s started from within a docker container though, so there’s some networking on the way.
I’ll setup a second server in the internet and then play with the firewalls (inbound/outbound), maybe this will provide more light in the problem.
2 are on my homework, one with a local hostname (a raspberry pi) and the other localhost running on a macbook. This should exclude network effects. I also played with the firewalls of the internet servers and it always remained stable.
It appears the ipfs-cluster errors only accumulate if the localhost node is present in the network. The raspberry pi with a hostname doesn’t create issues.
My hypothesis is that two nodes that can’t talk to each other (the one in the home network) create the timeouts. Will do some more testing.
So, some more results. Yes, the error mentioned above appears if the two local nodes cannot see each other in the network. I assume their address is advertised, but then they cannot connect within the `ipfs-cluster` and hence create the error message.
I’m not even sure if it’s an error in the end, but either I haven’t fully understood how the cluster works (could be) or a feature would be nice to tell each `ipfs-cluster`-node which role it has (similar to `dhtserver` or `dhtclient` in IPFS).
There are some operations, like status which require connecting to every node to fetch status information. So if connection is not possible, then you probably see those errors.
However, given that those nodes have outbound connectivity and that the cluster is small, they should be connected and stay connected to everyone (particularly on a 2-peer cluster). The error suggests a connection could not be open, but there should have been an existing one established by the other peer.
So something, your firewall, the libp2p resource manager, or the libp2p connection manager or a bug is disallowing long-term connections.
If you can start with --loglevel=swarm2:debug,connmgr:debug your local node, and verify that things work from the server node very soon after the local node has started (meaning, it opened a connection) maybe that throws some light.
There is also ipfs-cluster-ctl health graph which should tell which connections are open, iirc.