Regarding efficiency and completeness of findProviders API

How does the findProviders API in libp2p is resolving the list of nodes (providers) that provide a given content. Does the underlying search involve querying all the nodes in the network, or is there a node that maintains a dictionary of all the providers for that content. Or is the data returned based on some local cached copy, so not ALL the providers in the network for that content are guaranteed to be returned.

Thanks!

We use a DHT (like bittorrent). When a peer provides a block, it sticks a provider record on the DHT server responsible for that block.

Note: we are working on sending out fewer provider records (e.g., only announcing roots of files, roots of directory trees, etc) as announcing provider records for every chunk is a massive bottleneck for us.

Thank you @stebalien. Few other questions -

  1. Is there an API available that can be called by a node to dump a list of all the provider blocks that it is storing.
  2. Is there an issue if the node (DHT server) that is storing provider blocks is behind NAT
  3. Is there a way to associate TTL with the provider block being stored on the DHT server.
  4. Does a node storing the provider block on the DHT server has to keep checking of the DHT server is alive, and republish if needed.

Thanks
Rohit

Is there an API available that can be called by a node to dump a list of all the provider blocks that it is storing.

Not that I know of. Note: DHT nodes don’t store the blocks, just records of where they’re stored.

Is there an issue if the node (DHT server) that is storing provider blocks is behind NAT

Yes. However, we generally write provider blocks to multiple nodes to reduce these issues. I’d like to have nodes only promote themselves to full DHT nodes after some period of uptime/reachability but we don’t currently do that.

Is there a way to associate TTL with the provider block being stored on the DHT server.

Provider records last 24 hours by default.

Does a node storing the provider block on the DHT server has to keep checking of the DHT server is alive, and republish if needed.

No, we just reprovide every few hours anyways.

Thanks. One other question - given a Kademlia id which API I can call to determine the corresponding node that is responsible for the id. Basically I want to know the exact node which stores the key-value mapping when the put(key, value) API is called. Thanks

Will the findPeer() API of libp2p return the successor node for any provided Kademlia id.

That’ll find the exact peer (used when connecting to the peer in question). The method you’re looking for is probably getClosestPeers. That’ll return an array of peer IDs close to the key in question.