How does "ipfs get" work under the hood? Interplay of bitswap and DHT

Hey everybody,

I’ve been looking into IPFS for quite some time now, but recently I’ve been wondering what exactly happens when I call "ipfs get ".
The code is quite helpful for answering this, but I’m somewhat confused by the interplay of bitswap and the DHT.

What I though was happening:

  1. “ipfs get < h1 >”, where h1 is a multihash
  2. Propagate h1 to the network subsystem, resolving it in the process
  3. Find providers for h1 (or its children)
  4. Query these providers for the corresponding block(s)
  5. Done.

In my understanding, bitswap only came into play in step 4). But the specs say, that the current wantlist is propagated to the peers we are currently connected to. If this is true, wouldn’t this completely bypass the DHT? If I’m asking all my neighbors for blocks anyway, do I even need the DHT in the first place?

I’m sure that I misunderstood something here, so any help or clarification would be greatly appreciated!

You are both right and wrong. You do send your wantlist to all your connected peers. They do have the blocks you want around 1/2 of the time [citation needed][That’s a blurry memory I have from ipfs/camp videos][who cares if it’s 1/2 actually? Depends on so many things] .

BUT, in case they don’t have it, we send a request to the DHT at the same time, which provide us a peerID. From there, we connect to this peer, sent them our wantlist, and (miracle!) they have what we’re looking for 99% of the time. :sunglasses:

TL;DR: we use the DHT for when our already connected peers don’t have what we want.

2 Likes

Thank you so much for the clarification! It makes a lot more sense now :slight_smile:
This leads to follow-up question:

  1. Do we really send to all our peers? The default config says we should have between 600 and 900 connections + all those that arise from DHT queries.
    It seems so in the code (wantmanager.go, depending on how the PeerHandler Interface is used), but sending the wantlist to all of them seems like a huge communication overhead, even if messages are coalesced…