sl117
April 15, 2021, 9:05am
1
DHT is unable to fetch multi-address for certain peers when private network is scaled up (both in terms of number of nodes, and number of transactions wrt DHT). Due to this we face 2 issues:
it is unable to connect to certain nodes giving the error “dial backoff” and/or “i/o timeout”.
it is unable to fetch the providers for a given key/contentID for certain nodes, i.e. it gives empty response on dht.FindProvidersAsync(key) call even when dht.Provide(key) has been called for these nodes.
For (1) we tried to fetch the multiaddress from the DHT using dht.FindPeer API, but it gave the error “Routing: Not Found”
In both these cases all peers are online but still this issue persists when scaling the network up . On a smaller network it works fine for any number of calls to the DHT.
hector
April 15, 2021, 10:06am
2
Can you share the total number of nodes and how many connections they usually maintain? Are they dial-able from the others as needed?
sl117
April 15, 2021, 10:59am
3
We are running a network of 2000 nodes, with around 350-450 connections (as per host.network().Peers() api call). We did some local testing and found out that the nodes were dial-able from others as well.
hector
April 15, 2021, 12:21pm
4
This sounds like those nodes are not reachable or not running.
Any chance the records have expired (they need to be republished every 12h).
Anything relevant in logs?
After checking the most obvious explanations, you might need to look closely at debug logs from the node that cannot find routing and the node that it is trying to find routing to, make sure that it is still listening etc.
sl117
April 15, 2021, 12:42pm
5
We tried to print the o/p of “GetClosestPeers” which returned us the NodeIds & then we tried to use"FindPeer" to get the connection details of that node but Address Info was missing. How come NodeId is being retrieved but Address Info is not being retrieved?
It happens for records provided for around 30m-1h as well (not all but some)
adin
April 15, 2021, 1:35pm
6
The main thing I would check is what the routing tables of your peers look like and checking that peers have routing tables that are about the right amount of filled (i.e. you should see around Min(1/2^bucketNum, bucketSize)
peers per bucket.
Some code in go-ipfs that prints out stats like this is go-ipfs/stat_dht.go at master · ipfs/go-ipfs · GitHub (I highlighted the areas that deal with getting the peers out of the routing table and figuring out which buckets they are from).
1 Like
sl117
April 16, 2021, 3:42pm
7
Found following from the debug logs (Please copy it on nodepad++ for best view):
DEBUG swarm2 [limiter] clearing all peer dials: QmcE8JqxGpq95ev4kENpEchMYeCBgPTVBRXzAc2qRj2fG2
2021-04-16T15:20:04.156Z DEBUG swarm2 network for QmP3g6JP2NrERuDWkwMuMQgcwahSeQo6iw5VR68WPvLtJU finished dialing QmcE8JqxGpq95ev4kENpEchMYeCBgPTVBRXzAc2qRj2fG2
2021-04-16T15:20:04.156Z DEBUG dht request failed to open message sender {"error": "failed to dial QmcE8JqxGpq95ev4kENpEchMYeCBgPTVBRXzAc2qRj2fG2: all dials failed\n * [/ip4/192.168.183.109/tcp/5000] dial tcp4 0.0.0.0:5000->192.168.183.109:5000: i/o timeout", "to": "QmcE8JqxGpq95ev4kENpEchMYeCBgPTVBRXzAc2qRj2fG2"}
2021-04-16T15:20:04.157Z DEBUG swarm2 network for QmP3g6JP2NrERuDWkwMuMQgcwahSeQo6iw5VR68WPvLtJU finished dialing QmXJdbkHf7zyNPHV7Vw1JgvZ6YcwSrzuTyDcPhprM6hcdS
2021-04-16T15:20:04.157Z DEBUG dht request failed to open message sender {"error": "failed to dial QmXJdbkHf7zyNPHV7Vw1JgvZ6YcwSrzuTyDcPhprM6hcdS: all dials failed\n * [/ip4/192.168.117.240/tcp/5000] dial tcp4 0.0.0.0:5000->192.168.117.240:5000: i/o timeout", "to": "QmXJdbkHf7zyNPHV7Vw1JgvZ6YcwSrzuTyDcPhprM6hcdS"}
2021-04-16T15:20:04.157Z INFO swarm2 got error on dial: dial tcp4 0.0.0.0:5000->192.168.104.151:5000: i/o timeout
2021-04-16T15:20:04.157Z DEBUG swarm2 [limiter] clearing all peer dials: QmXKVcRi4fE6LvXGQsJK1xqyE5NbeAUhgJpJucxeoWQtz3
2021-04-16T15:20:04.157Z DEBUG swarm2 network for QmP3g6JP2NrERuDWkwMuMQgcwahSeQo6iw5VR68WPvLtJU finished dialing QmXKVcRi4fE6LvXGQsJK1xqyE5NbeAUhgJpJucxeoWQtz3
2021-04-16T15:20:04.157Z DEBUG dht request failed to open message sender {"error": "failed to dial QmXKVcRi4fE6LvXGQsJK1xqyE5NbeAUhgJpJucxeoWQtz3: all dials failed\n * [/ip4/192.168.104.151/tcp/5000] dial tcp4 0.0.0.0:5000->192.168.104.151:5000: i/o timeout", "to": "QmXKVcRi4fE6LvXGQsJK1xqyE5NbeAUhgJpJucxeoWQtz3"}
2021-04-16T15:20:04.156Z INFO swarm2 got error on dial: dial tcp4 0.0.0.0:5000->192.168.111.30:5000: i/o timeout
2021-04-16T15:20:04.158Z DEBUG swarm2 [limiter] clearing all peer dials: QmPdPg2LHadd3DpE1QVJ4xm2t9Svf261HDtw4dXK5utK1h
2021-04-16T15:20:04.158Z DEBUG swarm2 network for QmP3g6JP2NrERuDWkwMuMQgcwahSeQo6iw5VR68WPvLtJU finished dialing QmPdPg2LHadd3DpE1QVJ4xm2t9Svf261HDtw4dXK5utK1h
......
......
......
......
2021-04-16T15:20:04.432Z DEBUG pubsub PEERDOWN: Remove disconnected peer QmcK8ScGfi1bptmRyiyjhSgNsG1bvQEYV4q5XRXoLicHbM
2021-04-16T15:20:04.432Z DEBUG pubsub error reading rpc from QmcK8ScGfi1bptmRyiyjhSgNsG1bvQEYV4q5XRXoLicHbM: stream reset
2021-04-16T15:20:04.751Z DEBUG basichost protocol negotiation took 109.162µs
2021-04-16T15:20:04.751Z DEBUG dht peer found {"peer": "QmVuTG8C51UkfjUwV6jiviH9kDfHBzNPnjaJEbL3DbmZ2M"}
2021-04-16T15:20:04.751Z DEBUG dht handling message {"from": "QmVuTG8C51UkfjUwV6jiviH9kDfHBzNPnjaJEbL3DbmZ2M", "type": 4, "key": "EiAAAQMYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=="}
2021-04-16T15:20:04.752Z DEBUG dht handled message {"from": "QmVuTG8C51UkfjUwV6jiviH9kDfHBzNPnjaJEbL3DbmZ2M", "type": 4, "key": "EiAAAQMYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==", "time": 0.000772582}
2021-04-16T15:20:04.752Z DEBUG dht responded to message {"from": "QmVuTG8C51UkfjUwV6jiviH9kDfHBzNPnjaJEbL3DbmZ2M", "type": 4, "key": "EiAAAQMYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==", "time": 0.000883386}
2021-04-16T15:20:04.826Z DEBUG pubsub PEERDOWN: Remove disconnected peer QmPyEojyrWbbpLmZpztvBwdn46hGFM4Sv2LTTmaU4KjY6e
2021-04-16T15:20:04.826Z DEBUG pubsub error reading rpc from QmPyEojyrWbbpLmZpztvBwdn46hGFM4Sv2LTTmaU4KjY6e: stream reset
2021-04-16T15:20:05.238Z DEBUG dht peer found {"peer": "QmQFF15eRAGYpaXfabTRJHG4PxrJSU6cECr3XFPab6HVN4"}
2021-04-16T15:20:05.238Z DEBUG dht handling message {"from": "QmQFF15eRAGYpaXfabTRJHG4PxrJSU6cECr3XFPab6HVN4", "type": 2, "key": "EiB9IQgQkNC8Yi1Lisk3EmuMfGqgN+nL+Cjo9lAyVXOtBw=="}
2021-04-16T15:20:05.238Z DEBUG dht adding provider%!(EXTRA string=from, peer.ID=QmQFF15eRAGYpaXfabTRJHG4PxrJSU6cECr3XFPab6HVN4, string=key, dht.loggableProviderRecordBytes=bciqh2iiiccinbpdcfvfyvsjxcjvyy7dkua36ts7yfdupmubskvz22by)
2021-04-16T15:20:05.238Z DEBUG dht handled message {"from": "QmQFF15eRAGYpaXfabTRJHG4PxrJSU6cECr3XFPab6HVN4", "type": 2, "key": "EiB9IQgQkNC8Yi1Lisk3EmuMfGqgN+nL+Cjo9lAyVXOtBw==", "time": 0.000176133}
2021-04-16T15:20:05.411Z DEBUG basichost protocol negotiation took 106.054µs
2021-04-16T15:20:05.411Z DEBUG dht peer found {"peer": "QmQFF15eRAGYpaXfabTRJHG4PxrJSU6cECr3XFPab6HVN4"}
2021-04-16T15:20:05.411Z DEBUG dht handling message {"from": "QmQFF15eRAGYpaXfabTRJHG4PxrJSU6cECr3XFPab6HVN4", "type": 4, "key": "EiB/Hk5B6g0idkDqg9pxQ/erIripA31ec5S1y8saqq6Rdw=="}
2021-04-16T15:20:05.412Z DEBUG dht handled message {"from": "QmQFF15eRAGYpaXfabTRJHG4PxrJSU6cECr3XFPab6HVN4", "type": 4, "key": "EiB/Hk5B6g0idkDqg9pxQ/erIripA31ec5S1y8saqq6Rdw==", "time": 0.000248879}
2021-04-16T15:20:05.412Z DEBUG dht responded to message {"from": "QmQFF15eRAGYpaXfabTRJHG4PxrJSU6cECr3XFPab6HVN4", "type": 4, "key": "EiB/Hk5B6g0idkDqg9pxQ/erIripA31ec5S1y8saqq6Rdw==", "time": 0.00029735}
2021-04-16T15:20:05.670Z DEBUG pubsub error reading rpc from QmZYe3jmMDkGm7JbSpW2ushEBbcZ8NGPCo86r9kAew6mt9: stream reset
2021-04-16T15:20:05.670Z DEBUG pubsub PEERDOWN: Remove disconnected peer QmZYe3jmMDkGm7JbSpW2ushEBbcZ8NGPCo86r9kAew6mt9
2021-04-16T15:20:06.664Z DEBUG pubsub PEERDOWN: Remove disconnected peer QmeGrr36KmgxoDVgUjkq9YJgHGDjgeCYAuqFKcvN6dPrhm
2021-04-16T15:20:06.664Z DEBUG pubsub error reading rpc from QmeGrr36KmgxoDVgUjkq9YJgHGDjgeCYAuqFKcvN6dPrhm: stream reset
2021-04-16T15:20:07.039Z DEBUG pubsub error reading rpc from QmYEoMVWU9JHuxLD71RdvWn6TAFVampSuHpUC1mSHcjocN: stream reset
2021-04-16T15:20:07.039Z DEBUG pubsub PEERDOWN: Remove disconnected peer QmYEoMVWU9JHuxLD71RdvWn6TAFVampSuHpUC1mSHcjocN
2021-04-16T15:20:07.215Z DEBUG basichost protocol negotiation took 418.879µs
2021-04-16T15:20:07.215Z DEBUG basichost protocol negotiation took 105.297µs
2021-04-16T15:20:07.215Z DEBUG dht peer found {"peer": "QmbE6PuYGdgUXR1ZRTjwrPqxS9PNV9TQbSnUmJG1zSQn5F"}
2021-04-16T15:20:07.215Z DEBUG dht handling message {"from": "QmbE6PuYGdgUXR1ZRTjwrPqxS9PNV9TQbSnUmJG1zSQn5F", "type": 4, "key": "EiC/eehp6/X1TEI+187gMLcyvZakKyMob88CJrFSrf8rkg=="}
Brij
January 25, 2024, 8:05pm
8
any update on this one. Seeing somewhat similar when running a private network on version v0.24.1. Help would be appreciated