Will they publish their verifications to the public? Send them to the peers they are connected too? Give them on a request basis? Or keep them locally?
I think they should send them on a request basis.
I think there wonât be a final INDEX File. I doubt we can have an eventually consistent INDEX file. One of the reasons being different crawlers will disagree on what is a valid update, and so will different searchers.
Every searcher will do its best to know a set of trusted crawlers, and the searcherâs INDEX file will basically be the union of the crawlers INDEX File they know and trust.
Depending on implementations, searchers could either query the crawlers for results, or ask them their Index files to build their own Index locally (faster queries, but more maintenance costs).
(I can also imagine a third type of nodes in the models: Brokers/Coordinators. They will fetch Indexes from crawlers to build a big Index so that Searchers can query them. To Crowlers, they look like Searchers building their Index. To Searchers not building local Index, they look like Crawlers with big search capacity. )
Do we? I think the model is BitTorrent here: find nodes to cooperate with, define âcooperativeâ for your own use-case/implementation (any node, a node seeding enough, a fast node, a node without data but useful on DHT lookups,etc.) and work with them.
I can see different implementations for Crawlers and Searchers, with different criteria, and itâs fine.
Plenty.
Let me introduce you several implementations serving several use-cases:
- Crawlers
- The general purpose Crawlers. It listens to DHT traffic for new files and tries to increase its index every day. Most searchers use it.
- The project indexer. It participates in a Collaborative cluster and index that.
- The skeptical lazy verificator. It listens to the queries it receives and double-checks the results of other crawlers. Searchers like to query it because this fact-checker is always up to date on hot topics and reliable (unless you are the first to query).
- The Grammar Nazi Crawler. This guy crawls the same source as everyone else, but it is looking for something different. For example, it indexes well-written texts, and check for syntax errors so you can be sure to read quality post.
- The video Indexer Crawler. It fetches some frames and looks for cat videos. Most Searchers donât bother contacting it, but cat lovers do.
- The Greedy Bastard. It built an annotated pattent database and found similarities between some. Pretty neat, huh? But youâll need to pay a few coins to make a query :/.
- The General News crawlers. It crawls hot topics and makes a selection of articles. They say it works closely with the Grammar Nazi Crawler and the Is This Well-Written CrawlerâŠ
- The Times crawler. It doesnât crawl much. But if you want a Time Magazine article, ask for it here.
- The personal social media Crawler. It subscribed to your friendsâ nodes and indexes their posts. You can catch up with your friend Davidâs latest avocado toast pictures in one query.
- The Hosted-With-foo crawler. Foo.com lets you host your website on IPFS easily. They have a crawler only for the websites they help setting up, so that on their front page, they can say: âA guy did this incredible website with our tech!â.
- The Crawlers Crawler. Aka the Broker (see above). It sucks in all other Indexes and you can query only it to go super fast if you want. A bit centralized, but hey, you decide.
- Searchers
- General-purpose. It can query all the general-purpose indexers and most of the others.
- Specialized. Searching scientific papers. It gives its queries a super complicated and confusing format. Luckily, it is compatible with the Scientific Paper indexer.
- The Cat video searcher looks for Cat videos. It leaves in a dedicated Desktop App and queries only the Cat Video Indexer.
- More generally, there is a specialized searcher for every specialized crawler.
- The Overly Patriotic Searcher. Queries general-purpose indexers, but filters results from IP too far from itself. Its nemesis, the Globe-Trotter Searcher, does the opposite.
- The Fake News filter. It lets you search for any hot topic and serves you a random cat video instead. Itâs for your own sake.
And thereâre many moreâŠ
Anyway.
Where were we?
If there is no central authority AND no consensus (blockchain enforced or protocol enforced), you wonât have it, I think. But itâs fine. Weâre going down the Solution 5 road. There should be a standard way to communicate what type of crawler you are and what type of cooperation you are looking for, if any, but there wonât be a standard way to build trust. Effective ways of communicating what you do should help the network not contacting you if they donât like how you work, though.
I suspect most searchers wonât want to have a huge Index locally. They will keep no Index locally except for cache, and verify the results, not the whole merged Index. Some will want to build a local Index without also being full-on crawlers, but they will either ask for some small specialized Indexes, or check the provided Indexes probabilistically, or just check the trusted crawlersâ signatures.
Just the way I see it.
!
