Ok so first DHT have nothing to do about exchanging files.
DHT is a distributed directory where you can find 2 thing, peers and peers about a topic.
I’m gonna compare IPFS to torrent but be aware that this comparision is limted because some torrents client do implements a dht too (e.g. transmission).
If you want a file in a p2p network you need to somehow find peoples owning this file, you can’t just ask everyone around you randomly : “do you have this ?” (you could but this would be very inefficient).
How does torrent solve this issue ?
Simple, in a torrent you have so called tracker
, a tracker is a server running some kind of software. It is very simple, a client who owns a file will contact the tracker and tell something like :
Hey I own this file (Qmfoo), here are my IP and port too if someone need them.
Then when someone want this file, he is gonna contact the server asking :
Hey, do you know about Qmfoo?
And the server answer
Yes, here is the list of nodes and their IP that I know have this file.
At this moment the second node contact the first one and they start exchanging the file.
And how does IPFS fix this issue ?
Ipfs have a DHT, in the end the DHT does the same thing than trackers do but they don’t have any central server, only lots of random node.
Basicaly in a DHT everyone is a tracker, everyone may store the list of people owning the file and the list of people wanting it.
But we have a new issue, let’s say we have 50k nodes (ipfs probably have more), how do we know wich tracker to use ? Ipfs CID (Qm…) doesn’t incorporate any metadata, just id of the hashing algorithm and the hash.
So DHT have something called distance
the idea will be to somehow rank nodes seeing how far they are from your hash and we will use some of the closets nodes as trackers.
In IPFS the solution is to take the peer id (hash of the public key) of the node (Qm...
) and the hash you are searching and XOR them together, the smaller is the resulting number the smaller is the distance beetween your hash and this node.
So when a node owns a file it will first find in his peerstore the node wich is the closets to his file. Then he will ask him:
Hi, could you store that I have Qmfoo for me pls ?
The node now answer:
No sorry, I’m only Qmbor and I know Qmboo wich is even closer.
Now your node restart the process with Qmboo wich may redirect to Qmfoa.
Qmfoa is not perfectly the hash of your file but as hashs are distributed randomly this is expected.
Qmfoa doesn’t know a better node and now stores that your node have the file Qmfoo.
Now lets say a node wants to find Qmfoo (e.g. you request it on the gateway), he will repeat the search process but might take an other route (ask Qmfej wich redirect him to Qmfoi and Qmfoi to Qmfoa).
Now Qmfoa tell to the node searching Qmfoo that he knows your node have it.
And now the gateway will contact your node and they will start exchanging the file together.
Basicaly in this whole process that I just told you, they were 2 types of nodes:
- the clients (your node and the gateway), they only ask or store information in other nodes
- and the server (all other nodes), these one guided you through the DHT and stored the information for you (in torrent it’s trackers who have this role)
and the dht-client
option just disable the server part (by default both are active), so you still ask other node, they will store wich file you have but you will not do this for them.