File Discovery in IPFS

I’m new to IPFS and had was curious how people actually find files. As I’ve tried doing so I tried two approaches.

  1. Can you view files that connected peers currently have pinned? Or at least which files they have some blocks of?

  2. I’ve come across some discussions about sniffing the DHT. How does this work? I’ve viewed the logs of the running daemon (go-ipfs) but am unsure of how to actually retrieve a list of files on the network that happen to have passed by.

Thank you for the help

I’m not sure how to do it or how it works, but this is what the (currently defunct) ipfs-search site reportedly did. If that’s really what they did, there might be some details in their github repo.

1 Like

There are actually two parts to this question.

Content Routing

When you know the file you’re looking for, how do you actually find and download it? That is, how does IPFS resolve IPFS paths?

The answer to that is a mixture of the DHT and Bitswap. You ask the DHT which nodes have the content you’re looking for, establish connections to those nodes, and then ask those nodes for content (over Bitswap).

Search

When you don’t know what you’re looking for but you want content that matches a set of properties. That is, files that contain X, files that are related to subject Y, etc.

Currently, IPFS doesn’t provide a mechanism to do this. Yes, you can just download everything you can find, e.g., by sniffing DHT queries and fetching anything you see mentioned but this isn’t really a supported “feature”.

In the future (and this is probably how ipfs-search worked), the general approach to this will look very much like search does on the web today. You’ll have search engines that index content by several mechanisms:

  1. Users who submit links to be indexed to various search engines.
  2. Content found by traversing links from known/submitted content.
  3. Content found by sniffing DHT queries.

Now, ideally, we’d have a decentralized search engine for indexing this content. Unfortunately (or fortunately if you want to try tackling it :wink:), that’s an open problem. There have been several attempts but none of them have been able to even get close to the power of a centralized search engine. This is actually a really hard problem to tackle in a decentralized manner because search tends to be a highly centralized problem.

2 Likes

Thanks for the insight stebalien.

After doing research it seems people involved in IPFS and blockchain technologies are very anti-centralized anything.

I think it’d be super cool to have decentralized everything, but in order to drive usage it seems like it’d make sense to combine the best of both worlds. The power and familiarity of centralized systems while still leverging the power of IPFS. I guess the downside would be that centralized entity ends up with too much power (Google, Facebook, etc.)

Been looking for interesting side projects outside of work. Thought IPFS seemed very promising or at least interesting :slight_smile:

Thank you for the link leerspace. I’ll check it out!

A decentralized search would work if the network was monetized; that is, it would cost money to search. Then it would be profitable for search processors to process requests on the blockchain.

The actual money for a search could be mined on the unit that is performing the search; it would actually mine while interacting with the network.

1 Like

After doing research it seems people involved in IPFS and blockchain technologies are very anti-centralized anything.

In this case, I was just fantasizing. I’d love to have a decentralized, incentivised, global data curation/search system but that’s a hard problem and, honestly, it may not really be solvable (it may not be possible to beat a centralized system that has a global view into all the data).

For now, centralized search is probably going to dominate.

Note: One thing I believe will take off is decentralized content-curation (i.e., decentralized reddit, youtube, soundcloud, etc). That’s an easier problem as it simply involves categorizing/tagging content (often manually).

A decentralized search would work if the network was monetized; that is, it would cost money to search. Then it would be profitable for search processors to process requests on the blockchain.

That’s the general idea but, in practice, it would be really complicated (not saying it shouldn’t be done, as a matter of fact, someone will probably do it eventually). Unfortunately, you can’t just throw a blockchain at it and walk away.

For a non-exhaustive list, you’d have to deal with:

  1. Content discovery. (relatively easy)
  2. Content curation (indexing, tagging, associating). This is hard without a global view of all data in the system.
  3. Search result delivery. If the index is decentralized, I now have to collate results from multiple peers and merge them. Also, do I trust these peers? How do I know if they’re giving me good content?
  4. Search result feedback (feedback into step 2). The “search engine” now no longer automatically learns which links were clicked, it would have to pay users for this information (somehow).

And probably a lot more.

Yeah, I think all the points you listed are completely valid. Content creation seems to be a place for IPFS to really shine.

Problem is for an application to really drive usage it’ll require things to be just as good or better as existing solutions. I’m worried for common users instead of having to set up a wallet, deal with additional latency they would continue to just use Google. However, I can see the case for a content curation/creation site doing just as well and even better than many existing sites.

Any true scale for IPFS will have to come with us developers making it easy, performant, and accessible to the common user.

Hi MrRoboto here just for you ( and anyone ) click the link to my mp3 - © is mine but it’s a free to download Rock n roll track that I wrote - This is how I store my free tracks now - here is the link - http://localhost:8080/ipfs/QmdJqGKy8kXJDAk5WAay2zrrsAjY63K4ntuducxsNuT9bx

Your First IPFS Download to a file stored on IPFS

3 Likes