from my home internet I should get a ping of around 20ms (network is busy at the moment)
$ ping 139.18.16.88
PING 139.18.16.88 (139.18.16.88) 56(84) bytes of data.
64 bytes from 139.18.16.88: icmp_seq=1 ttl=60 time=53.1 ms
64 bytes from 139.18.16.88: icmp_seq=2 ttl=60 time=26.6 ms
64 bytes from 139.18.16.88: icmp_seq=3 ttl=60 time=31.8 ms
64 bytes from 139.18.16.88: icmp_seq=4 ttl=60 time=49.7 ms
64 bytes from 139.18.16.88: icmp_seq=5 ttl=60 time=33.0 ms
pinging through ipfs has a ping of ~90ms
$ ipfs ping 12D3KooWQZP1ftDawbTRxNmDmcKGmqii7gmLV8oxXgtTycYCcCFv
PING 12D3KooWQZP1ftDawbTRxNmDmcKGmqii7gmLV8oxXgtTycYCcCFv.
Pong received: time=132.50 ms
Pong received: time=92.26 ms
Pong received: time=90.62 ms
Pong received: time=91.53 ms
Pong received: time=87.80 ms
Pong received: time=91.57 ms
Pong received: time=91.29 ms
For whatever reason, it seems that port 4001 is not actually dialable on that IP address (139.18.16.88).
For example, trying to run libp2p identify with Vole against the TCP multiaddr fails:
vole libp2p ping "/ip4/139.18.16.88/tcp/4001/p2p/12D3KooWQZP1ftDawbTRxNmDmcKGmqii7gmLV8oxXgtTycYCcCFv"
panic: failed to dial: failed to dial 12D3KooWQZP1ftDawbTRxNmDmcKGmqii7gmLV8oxXgtTycYCcCFv: all dials failed
* [/ip4/139.18.16.88/tcp/4001] dial backoff
goroutine 1 [running]:
main.main()
/Users/myuser/dev/go/pkg/mod/github.com/ipfs-shipyard/vole@v0.0.0-20240806142935-abb8bd4529f0/main.go:343 +0xd94
Are you sure you don’t have a firewall blocking incoming connections on port 4001 on the machine running Kubo?
Port 4001 isn’t blocked (probably not on the load balancer level, but maybe on the node?)
$ nmap -p4000-4002 139.18.16.88
Starting Nmap 7.95 ( https://nmap.org ) at 2025-02-10 14:07 CET
Nmap scan report for 139.18.16.88
Host is up (0.040s latency).
PORT STATE SERVICE
4000/tcp closed remoteanything
4001/tcp open newoak
4002/tcp closed mlchat-proxy
Nmap done: 1 IP address (1 host up) scanned in 0.15 seconds
I had time to look a bit more into this. I have tried
nc -l -p 4001
on the ipfs host and then send a message via
nc 139.18.16.88 4001
and this works just fine.
Playing around with vole I got the following:
vole libp2p connect "/ip4/139.18.16.88/tcp/4001/p2p/12D3KooWQZP1ftDawbTRxNmDmcKGmqii7gmLV8oxXgtTycYCcCFv"
panic: failed to dial 12D3KooWQZP1ftDawbTRxNmDmcKGmqii7gmLV8oxXgtTycYCcCFv: all dials failed
* [/ip4/139.18.16.88/tcp/4001] failed to negotiate security protocol: read tcp4 172.22.243.169:40721->139.18.16.88:4001: read: connection reset by peer
goroutine 1 [running]:
main.main()
/home/gkraemer/go/pkg/mod/github.com/ipfs-shipyard/vole@v0.0.0-20250216200225-3e3fc561d9f6/main.go:394 +0x1085
Searching for the error I came across this, and I can confirm connections come from the loadbalancer IP and not the source IP. I hope our IT can fix it and I’ll report back.
If so, I suspect that it’s indeed due to how external traffic is routed to your IPFS node. Specifically, whether the original source IP for external traffic to the load balancer is preserved when it reaches your IPFS node.
Cluster obscures the client source IP and may cause a second hop to another node, but should have good overall load-spreading. Local preserves the client source IP and avoids a second hop for LoadBalancer and NodePort type Services, but risks potentially imbalanced traffic spreading. Create an External Load Balancer | Kubernetes
Otherwise, I would look into how the load balancer is configured and whether incoming traffic preserves the source IP.
I think I solved it, I will let it run for a couple of days and then report back if there are more issues.
IT said it is not possible for the load balancer to maintain its source IP and compared the situation to a home router, where ipfs also sits behind NAT and only sees the local address of the router. So this should be very similar to a situation that is actually quite common for ipfs.
So I did a test and set up port forwarding on my home router (I am lucky to have a real ipv4 IP at home) and configured my public IP in Addresses.Announce. ipfs id then gave my local ip but after ~ 2 minutes, I was switched to a relay connection.
Because the ipfs server has both a public IP from the loadbalancer and University NAT, the same always happened to the ipfs server.
and voila, no more random relays and things seem to work fine. From prior experience I know that IPFS doesn’t work well through our university NAT and therefore I strongly suspect that ipfs’ eagerness to use relays when it is behind NAT together with the NAT issues caused the problem.