I’m trying to trace down which peers are being contacted when routing over the DHT by analyzing the output of ipfs dht query -v Qm[...]. I’m converting multihashes into their hex representation (I’ve double checked that my script performs the exact multihash → hex transformation as https://cid.ipfs.io/). I expected to see IDs (in hex) converging towards the target ID. However the hex IDs of contacted peers are all over the ID space. Apparently these IDs are further hashed before being fed to the Kademlia XOR metric. The question is …how?!
As an indication of the issue, I’ve run ipfs stats dht and I’m dumping my <Bucket 9> peer IDs, which should have a common prefix of 9 bits, however they are all over the space:
For sanity check, the hex IDs listed above were derived from the following multihashes produced by ipfs stats dht for my Bucket 9:
QmWG4PUzaBF8cfJ4SYgfyDJyeF4pny6BHKAHbBx737aTKQ
QmUr5LLBJkSXXFGjT4YjWbGYvSKraoo4XJb63A6RmqmpbL
QmVTecRbxZhsYv8chZpPdBWdPDW6upZq1TH41hQYopEusz
QmYRub9Pktw3UNvsJ11tcZqM55YLq6CPLNPwzosTssWQUX
QmZ3SFouJMc5Rcxg9CgPZSDjNuHuuEDJD2GJXnAKytkU6h
QmX5DQa98umB9rdMnfaHKKycPhd8yxLgZRN8mhWjVA5m1k
QmZJgvG2UeAh6FSZJBYaG8ydXGGYF8BD1y7m1xF6CC6JnL
QmSnLyi5ZCtVvJrGRTSTv3jeDwB11onm9nX2crY2pZ9PKx
Qmci5wrnsWccGNLRx8bV67WvgBjkDGFvkH97zTqPViiBpR
QmWG25DqXUM6SgRWcjiR4BoF22vCgFvHiWHFtrFr5NHtXz
QmREJo4AnuodDeusPAgFvgzT6XMu4zKASiuyNRRbRdh8uf
The short answer to your question is that the key space being used is SHA256(data) (e.g. SHA256(PeerID), SHA256(dataMultihash), etc.) so you’re not going to be seeing convergence towards the data itself, but rather towards SHA256(data)
Thanks for your answer! However we have tried that as well… Namely, the SHA256 hashes of each of the above hex-form hashes are the following – still all over the place rather than sharing the common first 9 bits:
2051d904e5b78878ca7d156b8c14df24d31fd973b1ce922b6d190138ed0cd7b7 (this is for 75b05d7...)
def2b884bc614c248c9f9e027458ed7491749ee0dbf4899dc49ab24464b92636 (this is for 60afcee...)
4fb60259957eced8057aab910f99b3921f98205171c4bd87b541a4ac08fa0b27 (etc.)
30965340f6d68c905d628b687ce9cf45ecd9128a97156711f73b75f4dcb57ccf
63cbda6c2d5f2b7ca0edef34d9473cb476919ddeb58c058e1bc0d5f693179606
e094aff41165fcc2206e64fee4f61b78ff8c56ccf56753863d1c815672a46e37
aa1b6eb15f0a05300337d1c367c41eb20dd3b031420c84ab1e3c863eb11beb2d
1ad2000c2501e9cf03eeb5d254341d67c866a040368892591573d1bf5f74a866
d04b000c7794ed61d164018fe9f7f8a0088ad55d4b5952af75fac0409702b6a7
a08300c5e10ee07dc6774a9537663c3c92f398130084fa6f022a7d22950d7edd
0ab6be7cc68642c47591d218b872a442636baef8e491fc89aca11ffae8489de2
Please note we are hashing the binary version of the original hashes, i.e., the byte arrays (in Python), not the string hex representation (which we tried too, just in case, but thankfully it was not the case )
Our Python script to get the direct hex representations of the multihashes is the following:
import multihash
import hashlib
str = 'QmWG4PUzaBF8cfJ4SYgfyDJyeF4pny6BHKAHbBx737aTKQ' # first multihash in original post
bytes = str.encode('utf-8')
mh = multihash.decode(bytes, 'base58')
print(mh.digest.hex())
# --> This prints '75b05d7d6944a7ecd646774643f7b7241597de2680d94a0dd21e80040f26df9b'
Our script to get the SHA256 versions is the following:
import multihash
import hashlib
str = 'QmWG4PUzaBF8cfJ4SYgfyDJyeF4pny6BHKAHbBx737aTKQ'
bytes = str.encode('utf-8')
mh = multihash.decode(bytes, 'base58')
sh = hashlib.sha256(mh.digest)
print(sh.digest().hex())
# --> This prints '2051d904e5b78878ca7d156b8c14df24d31fd973b1ce922b6d190138ed0cd7b7'