Hello people of IPFS!
I was introduced to this library when I raised a topic in computer science forum regarding relavant libraries suitable for developing a peer to peer search engine.
I am a senior in college and we have to make a project for evaluation. I decided to work on a peer to peer to search engine where multiple peers would be sending the responses to the search query to the peer asking for the query.
I would be having multiple peers distributed among a group of 10-15 nodes to send a response for a query related to specific domain.
I would really like to have you guys give some of your precious time to help me with this. Whether I can use this libaray to implement the search engine or not.
I would definitely use Lucene (w/ Tika) and then perhaps make your project itself consist of writing a crawler that crawls IPFS data and loads it into a Lucene index. There is already a Solr (built on Lucene) app that can provide some GUI interface.
There’s a sample crawler app for Lucene (part of the Lucene distro) that can scan an entire folder structure of files, and you could use something like that example as your starting point for how to crawl IPFS data. There may already be some Lucene-based tooling for IPFS? I haven’t checked.
Random links I found: (I’m interested in this topic too, btw)
You can also have a look at Yacy for crawling the web 2.0. You may ba able to adapt it to crawl IPFS too.
Yes I had yacy in my set of “todo” links to learn about! Thanks for bringing that up!
I searched their issues for IPFS and found this:
I commented on that thread myself also, assuring them it’s now time to do IPFS!
Thank you for introducing me to lucene. Will definitely look into it.
@suvidsahay I just looked back at my Lucene-search code, and I had forgotten how far along I had gotten with my own indexer:
This file:
actually has the ability to scan folders and even automatically drills down into archive files! (ZIP, TAR, etc) to recursively index all content it finds …and uses Tika to get data out of files like PDFs, Image metadata, etc. So feel free to contact me thru my github/email, if you want to collaborate some.
I was planning to have Quanta.wiki usable as a corporate search tool some day, but coming up with a way to define a set of IPFS CIDs to [recursively] index (for search) the was very high on my todo list already, that the way I’d be doing that by by extending that FileIndexer.java. So I can help you test whatever you create by putting it into an actual product (Quanta)
So I can help you test whatever you create by putting it into an actual product (Quanta)
I would love to but actually we are already working in a group of 3 members with our professor mentoring us. Also this is a new technology for me so I would just try to grab as much as I can from this.
There is also
https://github.com/ipfs-search/ipfs-search