How the lookup algorithm works and the role of buckets

I was reading the DHT deep dive blog post from the 0.5 IPFS release and had a couple of questions:

In the blog post, Adin writes:

We also search for ourselves in the network (i.e., bucket 255), just in case the network size and distribution are such that the first 15 buckets do not suffice for us to learn about the K peers closest to us.

  1. What does bucket 255 mean? Does the number 255 have any relevance?
  2. Does go-ipfs essentially implement 16 buckets (with K peers each) where the first bucket is named the 0 bucket?

As I understand it the look up process for provider records for X looks as follows:

  1. create a query queue (), and add K closest peers from our routing table to the query queue
  2. Send Alpha concurrent queries asking those peers what are their K closes peers to X

Given Alpha = 10 and K = 20, at this point we’ve sent 10 queries, each of which will return their K (20) closest peers to X.

If each of the 10 initially contacted peers return 20 results the query queue grows from 20 peers up to 200 (if each of the 10 contacted peers returns 20 unique peers).

Is that correct thus far?

Is the query queue implemented using a priority queue where priority is determined by the distance (XOR)?

Assuming that we have 80 peers in the query queue after getting the response from the 10 peers we initially contacted, is this the point where the Beta parameter (3 for IPFS) is used and basically Beta peers are queried for the provider value of X?

(My understanding is loosely based on the lookup algorithm section of the blog post which I’m not sure I fully follow and the docs. It’s also covered nicely by Dennis in the following talk: https://youtu.be/wbY-MueAfXg?t=286

The bucket X means, the bucket of (DHT server) peers that have X bits in common with my network identity. The network identities are 256 bits, so bucket 255 is the last that will have anything. I guess technically we were searching for bucket 256 rather than 255, but :man_shrugging: they’re effectively the same thing.

  1. Does go-ipfs essentially implement 16 buckets (with K peers each) where the first bucket is named the 0 bucket?

Sort of, it’s more like it has 256 buckets (0…255) where only the first 16 (0…15) are properly refreshed. This is basically because the FIND_NODE RPC is not great since it doubles as FIND_PEER. If we had a FIND_NODE_BY_KAD_ID (Fix Routing Table Refresh after we have a SearchByKadId RPC · Issue #556 · libp2p/go-libp2p-kad-dht · GitHub) we wouldn’t have to do the awful hacks in https://github.com/libp2p/go-libp2p-kbucket/blob/master/generate/main.go :sob:.

Yes, the first of these buckets is the 0 bucket (i.e. peers with a common-prefix-length of zero).

Mostly although a few nits:

  • 20 * 10 = 200, not 100
  • Some of those peers may overlap so <=200 new peers will be discovered in that first wave
  • This whole process can happen asynchronously so you may be sending out more queries before the first ones have returned

Yep, you query peers based on closeness (in XOR space) to the target.

Beta is a resiliency parameter used in the (at the moment not very good) mechanism for deciding on the termination condition. Basically, you need to decide when it’s worth not doing any more queries from the priority queue.

At the moment the standard go-libp2p-kad-dht client (and the servers) wait for you to hear from the Beta closest peers as the termination condition. Maybe this is reasonable for routing table maintenance, but it seems like overkill for most queries. In particular, not every peer in every routing table is guaranteed to be alive. So what if the peer closest to the target is not around anymore, am I stuck waiting on a timeout, that seems bad. There are plenty of other approaches here that seem worth evaluating, such as some weighting of the top peers based on their distance to the target.


If you’re interested in this you may also want to look at Notion – The all-in-one workspace for your notes, tasks, wikis, and databases., which describes some general DHT intuition as well as some of the simple optimizations made in the FullRT client in go-libp2p-kad-dht (https://github.com/libp2p/go-libp2p-kad-dht/blob/0b7ac010657443bc0675b3bd61133fe04d61d25b/fullrt/dht.go)

1 Like

Sort of, it’s more like it has 256 buckets (0…255) where only the first 16 (0…15) are properly refreshed.

So there are 256 buckets but go-ipfs keeps K (20) open connections to the first 16 (0…15)?

If the buckets are numbered based on the number of common bits with the node’s network identity, why would we only refresh the first 16 which have the least common bits, thereby making them the most “distant” nodes?

Beta is a resiliency parameter used in the (at the moment not very good) mechanism for deciding on the termination condition. Basically, you need to decide when it’s worth not doing any more queries from the priority queue.

As I understand it, the look up process is basically split into two parts:

  • querying concurrently Alpha peers for any peers they have closer to X.
  • Asking beta (3) closest nodes to X for the provider record for X.

Two questions:

  1. Is there a distinction between querying for closer peers (which it seems might be done at least 10 times for in a look up) and querying for the provider record (up to beta times)?
  2. When do we switch from the first part to the second part? Is is when we get a response from a certain number of peers?

The only nodes we really need to keep open connections to are the K nodes closest to you in the key space. This is basically because we want our K closest peers to have our latest addresses (and there’s no Put for Peer Records Remove special cases for Peer and Provider records · Issue #584 · libp2p/go-libp2p-kad-dht · GitHub).

go-libp2p-kad-dht keeps connections to the first 2 buckets as well since 75% of queries will start in the first 2 buckets which means you can get a performance speed up.

We refresh the first 16 and the last. This is because these are essentially the only ones we can reasonably implement at the protocol layer.

This is because (as described in the issue around FIND_NODE_BY_KAD_ID) there’s no way to ask for a node by it’s Kad ID, only Peer ID, which means looking for peers in a given bucket is hard. This is an ancient protocol problem. The way in which this has been worked around for the time being is that we brute-force the first 16 bits of SHA-2 (linked above) and use that for the first 16 buckets and then search for the closest peers to ourselves.

Note that 2^16 = 65536, so if we don’t have more server nodes than that then this isn’t much of a problem. IIRC at the moment we’re in the 20k range (I think this crawler is up to date Periodic Measurements of IPFS).

As mentioned above this is not what alpha and beta are for. Alpha is a concurrency parameter for your client to decide how many outbound RPCs you’ll send in parallel, beta helps you figure out when your query is done.

For some queries (e.g. finding provider records) clients are happy to emit partial results along the way to completing the query (e.g. since we can get started downloading data with even a single provider). However, others require completing the query to be useful such as figuring out which peers to send an ADD_PROVIDER RPC to.

See the libp2p kad-dht spec for more info https://github.com/libp2p/specs/tree/65f9923a405a8b33c4208a3d8b9e6340b9474efc/kad-dht

The bucket X means, the bucket of (DHT server) peers that have X bits in common with my network identity.

Based on that, is it correct that the higher X the closer the nodes in that bucket are to my node? (more common bits == smaller XOR distance)

it’s more like it has 256 buckets (0…255) where only the first 16 (0…15) are properly refreshed.

Based on the statement above, it seems that the first 16 buckets would be with the nodes furthest away.

This contradicts the following statement you made

The only nodes we really need to keep open connections to are the K nodes closest to you in the key space.

Am I misunderstanding something here?

(as described in the issue around FIND_NODE_BY_KAD_ID ) there’s no way to ask for a node by it’s Kad ID, only Peer ID

I wasn’t aware of any distinction between Kad ID and Peer ID. I tried to look at the issue you linked to but I am missing context which makes it hard for me to understand what you mean by:

We refresh the first 16 and the last. This is because these are essentially the only ones we can reasonably implement at the protocol layer.

Ok, I think the penny finally dropped on the point about buckets and why only the first 16 are properly refreshed.

While watching your talk Whiteboard Series with NEAR | Ep: 42 Adin Schmahmann from Protocol Labs' IPFS - YouTube, you make the point that the closer you go (with the buckets) the lower the odds are that you will find a node so close (“it’s like breaking a hash function”).

I still don’t understand what’s the difference between the KadID and PeerID. I looked in specs/kad-dht at master · libp2p/specs · GitHub but couldn’t find mention of it.

Yes

Yes, as mentioned we refresh the first 16 and then find the K closest to us (i.e. querying the last bucket).

The peers responsible for keeping our peer records (i.e. mappings of peerID → multiaddresses) are the K peers closest to us in the Kademlia space. This is true for every other record type as well (i.e. provider records are stored with the K peers closest to the provider record’s key after it’s projected into the Kademlia space by taking SHA256(key) as described in the spec). By staying connected to those peers every time our address changes we end up updating (via Identify push) our addresses as they change.

Separately, as an optimization we may keep connections alive to the peers furthest away from us since they are the most likely to be used for queries (i.e. 50% of queries will start in bucket 0, 25% in bucket 1, …) which makes not having to form those connections potentially useful speedups for lookups.

PeerIDs are the identities of peers as described in https://github.com/libp2p/specs/blob/f433ad595224cf33d916c166d1738f11aadfa9f7/peer-ids/peer-ids.md. However, when looking for the closest nodes to a given key we need to put them in the same keyspace so we do sha256(key) and find the nodes with peerIDs that have the smallest XOR distance to sha256(their peerID).

Yes, it’s about breaking a hash function but not quite the way that line makes it sound.

This is true, as you work your way through the buckets at some point there just won’t be enough people to totally fill a given bucket because the odds are so low and “it’s like breaking a hash function”.

However, the reason we can’t even try and refresh bucket 20 (no matter how many nodes are actually in the network) is because the FIND_NODE RPC is basically incorrect.

It’d be nice if we could send a KadID (i.e. 256 bits) kid and ask for a list of the K PeerIDs such that they have the smallest XORDistance(kid, sha256(peerID)). Instead we send some peerID pid and ask for the K PeerIDs such that they have the smallest XORDistance(sha256(pid), sha256(peerID)).

This means we’re incapable of just asking for the peers that would be in our bucket 20 since we can’t fabricate some KadID to ask for like [first 20 bits of sha256 of our peerID] || [236 zeros] and instead have to send some other key such that after taking SHA256 of it the first 20 bits would match what we want. This is actually breaking bits of a hash function.

We can leverage some workarounds by embedding a brute-forcing of the first 16 bits of sha256 (https://github.com/libp2p/go-libp2p-kbucket/blob/d5e5a48422401a18ebeb848d3b3314370570146a/generate/main.go), but doing this is pretty bad as the number of bits you need to break get larger. At the moment 16 is fine though based on network size (as mentioned above) so this isn’t practically a problem today.

1 Like