File Discovery in IPFS

There are actually two parts to this question.

Content Routing

When you know the file you’re looking for, how do you actually find and download it? That is, how does IPFS resolve IPFS paths?

The answer to that is a mixture of the DHT and Bitswap. You ask the DHT which nodes have the content you’re looking for, establish connections to those nodes, and then ask those nodes for content (over Bitswap).

Search

When you don’t know what you’re looking for but you want content that matches a set of properties. That is, files that contain X, files that are related to subject Y, etc.

Currently, IPFS doesn’t provide a mechanism to do this. Yes, you can just download everything you can find, e.g., by sniffing DHT queries and fetching anything you see mentioned but this isn’t really a supported “feature”.

In the future (and this is probably how ipfs-search worked), the general approach to this will look very much like search does on the web today. You’ll have search engines that index content by several mechanisms:

  1. Users who submit links to be indexed to various search engines.
  2. Content found by traversing links from known/submitted content.
  3. Content found by sniffing DHT queries.

Now, ideally, we’d have a decentralized search engine for indexing this content. Unfortunately (or fortunately if you want to try tackling it :wink:), that’s an open problem. There have been several attempts but none of them have been able to even get close to the power of a centralized search engine. This is actually a really hard problem to tackle in a decentralized manner because search tends to be a highly centralized problem.

2 Likes