I’m wondering how to best implement the idea of a (mutable) set or map on IPFS. Posting to “ecosystems” because this seems like a generic feature that many apps/app-developers would want to have.
In my case, I am trying to port an existing graph database to IPFS, and deep down in the guts, it conceptually boils down to a big (mutable) store of (guid, cid) pairs. That is, a lookup table of guid to cid.
Here, “guid” is a “globally unique ID” – same thing as CID, since all users know the content, so all users can always get the guid on their own, without any help. What I need to do is to tell the user what the latest flavor-of-the-day CID is that goes with it. So - a basic, simple lookup.
The gotchas:
- There may be dozens or hundreds of writers simultaneously updating the lookup table.
- The typical lookup table has from 1 million to 100 million entries in it (based on “typical” current uses).
The design alternatives:
- Put the lookup table in one huge file. Well, clearly, that’s not “decentralized”, and has all sorts of issues with multiple writers. (for example, making that file be a CRDT append-only log is not practical, if it starts with 100 million entries, and only gets larger. And if it’s not CRDT, then it requires a lock to update.).
- Create an IPNS key for each guid. Well, creating 1 million or 100 million different IPNS keys does not seem very practical or scalable, either.
Is there a third way? All I can think of is an IPNS extension, so that, given a (PKI-key, GUID) pair, I can look up the CID that goes with it.
I’m imagining that most database apps, or anyone maintaining a mutable index of data, is going to face this issue. It’s not hard to solve with a single, centralized master file holding the lookups, but a single centralized massive file sure doesn’t feel very “decent” to me … ideas? How to move forward?