Minerva: Build the Hadoop-Hive on IPFS

maris205 · July 6, 2019, 12:29pm

Hi

There still lacks a big data system on IPFS. So we built Minerva, which could be regarded as the Hive on IPFS. Using Minerva, you could use standard SQL to query the file content on IPFS (json, csv format).

Minerva is based on Drill and IPFS. Technically, it’s a Drill storage plugin that connects IPFS’s decentralized storage and Drill’s flexible query engine. Any data file stored on IPFS can be easily accessed from Drill’s query interface, just like a file stored on a local disk. T

The basic idea is very simple: run a Drill instance along the IPFS daemon, and you can connect to other users on IPFS who are also using Minerva. If one of the users happens to have stored the file you are trying to query, then Drill can send execution plan to that node, who executes the operations locally and returns the results back. Of course, other users can benefit from your node as well, if you are sharing the data they want. If there are enough people running Minerva, data sharing and querying can be made distributed and more efficient!

If you are insterested, we have made a few slides that explain the ideas in details:

Any suggestion is welcome.

Find the code on GitHub: https://github.com/bdchain/Minerva

A live demo: http://www.datahub.pub (may be unstable please bear with it)

josselinchevalay · July 23, 2019, 1:24pm

good project, this very intresting

Topic		Replies	Views
Can anyone talk about the comparison between Minio cluster and IPFS cluster? IPFS Cluster ipfs-cluster	3	2935	December 17, 2020
IPFS Cluster based off-chain storage customizable for various DApp needs Ecosystem and Usage ipfs-cluster	0	53	June 23, 2025
Centralized DB support? Help	12	1211	May 23, 2017
Can we use ipfs for big data like hdfs? Ecosystem and Usage use-cases-and-apps	3	817	February 12, 2022
Large files long-term storage IPFS Cluster	1	1241	December 12, 2018

Minerva: Build the Hadoop-Hive on IPFS

Related topics