Sets or Maps (lookup tables) on IPFS

linas · October 29, 2019, 7:07pm

I’m wondering how to best implement the idea of a (mutable) set or map on IPFS. Posting to “ecosystems” because this seems like a generic feature that many apps/app-developers would want to have.

In my case, I am trying to port an existing graph database to IPFS, and deep down in the guts, it conceptually boils down to a big (mutable) store of (guid, cid) pairs. That is, a lookup table of guid to cid.

Here, “guid” is a “globally unique ID” – same thing as CID, since all users know the content, so all users can always get the guid on their own, without any help. What I need to do is to tell the user what the latest flavor-of-the-day CID is that goes with it. So - a basic, simple lookup.

The gotchas:

There may be dozens or hundreds of writers simultaneously updating the lookup table.
The typical lookup table has from 1 million to 100 million entries in it (based on “typical” current uses).

The design alternatives:

Put the lookup table in one huge file. Well, clearly, that’s not “decentralized”, and has all sorts of issues with multiple writers. (for example, making that file be a CRDT append-only log is not practical, if it starts with 100 million entries, and only gets larger. And if it’s not CRDT, then it requires a lock to update.).
Create an IPNS key for each guid. Well, creating 1 million or 100 million different IPNS keys does not seem very practical or scalable, either.

Is there a third way? All I can think of is an IPNS extension, so that, given a (PKI-key, GUID) pair, I can look up the CID that goes with it.

I’m imagining that most database apps, or anyone maintaining a mutable index of data, is going to face this issue. It’s not hard to solve with a single, centralized master file holding the lookups, but a single centralized massive file sure doesn’t feel very “decent” to me … ideas? How to move forward?

carsonfarmer · October 29, 2019, 7:22pm

There are several decentralized databases (key/value tables) built on IPFS. See https://github.com/textileio/go-textile-threads for a new work in progress, also https://github.com/orbitdb for one another one built on IPFS and pubsub.

carsonfarmer · October 29, 2019, 7:24pm

Actually, if you are ok with Go, there’s also this, which is more of a Map on IPFS: https://github.com/ipfs/go-ds-crdt

linas · October 29, 2019, 9:52pm

Hmm, OK, thanks, I was kind of hoping for something more low-level, so as to avoid dependencies and overhead. – In my case, each vertex/edge in my graph DB contains a key-value DB inside of it (so e.g. the natural language apps have a 1 to 100 million key-value DB’s in them, with one per word/phrase/sentence/paragraph, that DB holding suff like probabilities, counts and other non-traversable/non-querieable content) – layering all that on top of a flat key-value singleton kind-of defeats the purpose of having it be graph structured.

But I’ll take a look, see if I can steal some ideas.

hector · October 31, 2019, 10:45am

The thing that comes to mind is that you could split into multiple flat-key/value stores (one per vertex?) and then keep a flat one for the graph?

Topic		Replies	Views
Using IPFS DHT for custom key/value Help	0	216	November 15, 2022
Lookup performance Help	1	430	May 23, 2017
An IPFS-based DB Ecosystem and Usage	8	3041	April 5, 2018
Maintain a changing list of nodes with a fixed CID Help	7	828	March 21, 2020
[COMMUNITY DISCUSSION] IpfsCloud v2.0 Draft Ecosystem and Usage js-ipfs , go-ipfs , ipfs-cluster , use-cases-and-apps	10	2029	February 26, 2021

Sets or Maps (lookup tables) on IPFS

Related topics