Sets or Maps (lookup tables) on IPFS

I’m wondering how to best implement the idea of a (mutable) set or map on IPFS. Posting to “ecosystems” because this seems like a generic feature that many apps/app-developers would want to have.

In my case, I am trying to port an existing graph database to IPFS, and deep down in the guts, it conceptually boils down to a big (mutable) store of (guid, cid) pairs. That is, a lookup table of guid to cid.

Here, “guid” is a “globally unique ID” – same thing as CID, since all users know the content, so all users can always get the guid on their own, without any help. What I need to do is to tell the user what the latest flavor-of-the-day CID is that goes with it. So - a basic, simple lookup.

The gotchas:

  • There may be dozens or hundreds of writers simultaneously updating the lookup table.
  • The typical lookup table has from 1 million to 100 million entries in it (based on “typical” current uses).

The design alternatives:

  • Put the lookup table in one huge file. Well, clearly, that’s not “decentralized”, and has all sorts of issues with multiple writers. (for example, making that file be a CRDT append-only log is not practical, if it starts with 100 million entries, and only gets larger. And if it’s not CRDT, then it requires a lock to update.).
  • Create an IPNS key for each guid. Well, creating 1 million or 100 million different IPNS keys does not seem very practical or scalable, either.

Is there a third way? All I can think of is an IPNS extension, so that, given a (PKI-key, GUID) pair, I can look up the CID that goes with it.

I’m imagining that most database apps, or anyone maintaining a mutable index of data, is going to face this issue. It’s not hard to solve with a single, centralized master file holding the lookups, but a single centralized massive file sure doesn’t feel very “decent” to me … ideas? How to move forward?

There are several decentralized databases (key/value tables) built on IPFS. See https://github.com/textileio/go-textile-threads for a new work in progress, also https://github.com/orbitdb for one another one built on IPFS and pubsub.

1 Like

Actually, if you are ok with Go, there’s also this, which is more of a Map on IPFS: https://github.com/ipfs/go-ds-crdt

1 Like

Hmm, OK, thanks, I was kind of hoping for something more low-level, so as to avoid dependencies and overhead. – In my case, each vertex/edge in my graph DB contains a key-value DB inside of it (so e.g. the natural language apps have a 1 to 100 million key-value DB’s in them, with one per word/phrase/sentence/paragraph, that DB holding suff like probabilities, counts and other non-traversable/non-querieable content) – layering all that on top of a flat key-value singleton kind-of defeats the purpose of having it be graph structured.

But I’ll take a look, see if I can steal some ideas.

The thing that comes to mind is that you could split into multiple flat-key/value stores (one per vertex?) and then keep a flat one for the graph?

1 Like