Would there be an interest, in an IPFS Search Engine?

Vgk88 · May 27, 2020, 4:41pm

This is more like a cyberlink knowledge graph. You must upload your files or IPFS hashes with keywords to the knowledge graph. After that you will be able to find it in search engine. The knowledge graph is now in the process of filling. It already has over 100k cyber links.

https://cyber.page/

CSDUMMI · May 27, 2020, 5:15pm

My Problem is:
Is it really more than just a search query on DuckDuckGo with the site: attribute set to an IPFS Gateway?
Is it decentralized, interplanetary, independent, censorship resistant or modular?
Can you download the knowledge graph yourself and work on it independently? Can you create your own search algorithms?

fuckGoogle · May 27, 2020, 6:03pm

Yes, yes, yes and yes. I sent you links to some docs above. Check it out =)

It is decentralized. It is interplanetary. It is independent. There is no censorship. It is modular. You can fork the client or the chain. You can build your own graph, etc

You can create your own search algorithms. You can govern the whole system, via onchain governance, etc

fuckGoogle · May 27, 2020, 6:04pm

What do you mean by that? Cyber uses IPFS as DB for storing content and it uses IPFS CIDs to create cyberlinks. However cyber can work with any stateful or statelss protocol as long as you can have a pair of CIDs and prove their source

CSDUMMI · May 27, 2020, 6:44pm

I mean, suppose that cyber.page was blocked in my country. Could I still access it, for example through IPFS,
by downloading some software or accessing some File on IPFS?

fuckGoogle · May 27, 2020, 7:01pm

cyber.page is just a POC reference gateway. No more. You can access the protocol via nay possible client. For example there is a TG bot @cyberdBot or there are firefox and chrome alpha extensions. We have started to work on a browser, called cyb, which is actually a personal blockchain application on top of a protocol. So no way to block it, unless you shut down the network.

Anyone is free to fork the client and to build whatever gateway they want to it, they could even make it private or semi private by filtering the front end.

Its still early days and and not in the mainnet. If you fancy, i’d be happy to chat and tell you how it works more in detail. Or you can check out the code on GH: https://github.com/cybercongress/go-cyber (thats the protocol repo)

The short answer is yes, you can access it

sinkuu · May 28, 2020, 1:28am

I wonder if any blockchain-based system can be “interplanetary”. Up to 24 minutes of the latency between Earth and Mars easily kills it (miners/validators cannot be distributed at least). Also, one of the super fancy goals of IPFS is to be partitionable, that is:

Info and apps function equally well in local area networks and offline. The Web is a partitionable fabric, like the internet.

This is also a reasonable requirement for interplanetary system, and seems to exclude the current blockchain technology which assumes the existence of the global, unpartitioned internet.

CSDUMMI · May 28, 2020, 6:28am

I have to agree with you, @sinkuu That’s why it is not such a bad idea to have no global index, but rather small crawler communities,
that eventually exchange their findings, but don’t stay in constant contact with everything else.

CSDUMMI · May 28, 2020, 12:00pm

I think that my MVP Crawler kind of works.
It creates a reverse index, mapping keywords to CIDs in the JSON Format.
I didn’t yet get it to automatically publish to IPNS, because that seems to take very long.

xhipster · May 28, 2020, 3:14pm

Hi! I created Cyber and would love to address your concerns.

I wonder if any blockchain-based system can be “interplanetary”. Up to 24 minutes of the latency between Earth and Mars easily kills it (miners/validators cannot be distributed at least)

You are correct that It’s likely we will not be able to sync one chain between Mars and Earth. But we should not. I am pretty sure semantics on Mars will be very different from the Earth semantics.

That is why we defined the mechanism which allow to prove the rank for any given CID from Earth chain to any knowledge graph which will run by Martians. So you will need to sync only ranks of anchor CIDs back and forth using some relay.

xhipster · May 28, 2020, 3:27pm

I am pretty sure that solution 5 will not be able to work without solution 4.

You can learn from Yaca, that you cant build the search engine which will be useful following complete bottom-up utopia. The reason for this is quite straightforward: relevance have to be somehow protected from trivial sybil attacks. You can not achieve this without some economics. And yep, you cant add economic layer without dlt due to double spends.
Another problem with bottom-up utopia is that due to inability to have the full index such search will never be able to answer questions better than top-down solution.
Top-down approach must not be complex and centralized around one blockchain to rule them all. Сheck wp
I am pretty sure that it is a good idea to develop bottom-up utopia on top of top-down so you can get the betterness from two worlds.

xhipster · May 28, 2020, 3:29pm

You can run you own node and get all raw index from peer2peer network by yourself.

CSDUMMI · May 28, 2020, 5:31pm

I don’t think, we should use either:

a central data structure, that has to be synced with all nodes.
a distinction between peers.

This means, that I don’t think, we should any longer talk about crawlers and searchers, because a search engine on IPFS should embrace the design philosophy of IPFS, such that wherever IPFS can be used, the IPFS Search Engine can be used as well.

And IPFS has no distinction between peers, thus Search Engine Peers shouldn’t either.
They should be an addon upon an IPFS Peer, any IPFS peer.

To talk about relevance, this should be a decision made by every peer.
So, please, what is the concrete definition of a sibyl attack? And is it a real threat for a decentralized, unsynced data structure?

CSDUMMI · May 28, 2020, 8:51pm

Another part of the design philosophy of IPFS, that I would implement in an IPFS Search Engine, is the common format, around multiple formats.
Meaning, that there should be a format, that leaves almost nothing about a search index implied, from format of the data, the data structure, the encoding and even the content and focus of the index.

Akita · May 29, 2020, 8:04am

@xhipster

I disagree with that. Relevance can be protected from the bottom up, by keeping track of local reputation. To mitigate Sybil, you have to give very little trust to an unknown node, like IPFS, Bittorrent, or Bitcoin miners do (any trustless P2P network, really), and remember the one you liked. One could argue that in these examples, a “good” peer is easy to identify because the expected result is deterministic. My answer to that is that you could have the Searcher side logic decide over time which Crawlers were a good provider based on user interaction. User can provide feedback about the results to the Searchers, that can update the reputations and little by little build its phone book of good providers to call. As explained above, the drawback is shaky quality at first, “sub-optimal” results, but greater way flexibility and personalization (the user decides who is good for them. The blockchain/consensus/DLT has no say on the matter).
So I really think this is about what utopia you want, not a technical impossibility.

Yes and no. The top-down solution means a homogeneity of the way things are indexed. It will scan way more things, but maybe not the way I want. Some specialized crawlers (or better aligned with the way I would index) could be a better fit for me even if their results would be rejected by the top-down solution as “not standard”. Bottom-up is almost guaranteed to be more completed, and more standardized (which can be a strength or a weakness depending of my use-case). Also, maybe I just want a “good enough” result that is not worth the complexity.

@CSDUMMI

I can, but does it have to? IPFS is just a (great) tool to move data around.
What is IPFS philosophy anyway?
Decentralize torrenting of (often pirated) contents?
Or help a legal streaming platform optimize its business?

I think you see IPFS as the future of browsing, and fetching content for the human end-user. This is a very valid use case, but some IPFS nodes will run on private networks, some run on mobile, some will fetch cat videos and some will synchronise a blockchain and will need the service of a Smart Contract Crawler.

I encourage both of you to develop Search Engine, but keep in mind that this is for the human browsing human use-case, and that not every humans will want the same thing. Also, not every IPFS nodes will want to participate in the indexing.

fuckGoogle · May 29, 2020, 11:29am

Apart from IPFS, there are plenty more great protocols that a web3 search protocol should be able to utilize. DAT, GIT, Swarm, any blockchain protocol, BTC, ETH, DAGs, etc. Not just IPFS

CSDUMMI · May 30, 2020, 3:45pm

Well, yes, but those are entirely different protocols.
So, we would need different Indices, that would be combined
to make a search engine, that can search all those protocols.
But there would have to be a format for each of those.

And I would like there to be a good definition of the
Index for IPFS. I don’t know the other protocols, that should be the matter of their developers.

fuckGoogle · May 30, 2020, 4:53pm

That what we are doing at cyber. Our goal is to teach it to work with IPFS. When its done, we, or the community, will begin to implement other protocols it is able to work with. As I mentioned, the design allows for bot stateful and stateless protocols to be supported as long as we can get a pair of provable CIDs

fawazahmed0 · August 14, 2021, 11:23pm

There a paper on distributed IPFS Search engine, not sure if that helps:

Here is the Link

Topic		Replies	Views
Support for creating a peer to peer data sharing platform Libraries, Archives and Museums	7	948	October 15, 2020
Decentralized search using IPFS Ecosystem and Usage ipns , use-cases-and-apps	3	1555	May 30, 2018
Ipfs-search.com shutdown	4	1168	August 23, 2023
Would IPFS consider creating a built-in search engine?	0	256	September 29, 2023
Index of IPFS root blocks Ecosystem and Usage search , indexing , use-cases-and-apps	6	1600	October 12, 2017

Would there be an interest, in an IPFS Search Engine?

Related topics