How to maintain a list of users in a fully distributed trustless way?

markg85 · April 8, 2021, 9:44pm

Hi,

Picture a chat platform. Take irc or discord for example.
They have a list of users in their chat room.

In essence i’m looking to accomplish exactly that same concept on a website with no central server.

Picture there to be a chat site (perhaps like discord, just go with the concept here) with no central server at all. Serving the website would just be serving the static data. The site would be served from IPFS.

Assume for a moment that every user is communicating over pubsub over at least 1 common known channel.

In this concept i’m having a really hard time figuring out how to get the user list up to date and send to every new user opening the site. As there is no central server, i can’t get it from there.

Thus far i have come up with two possible ways to get this working.

At some point the site goes from 0 to 1 users. That 1 user now has the full list of users. Then a second user opens the site and asks on the pubsub channel: “whoever is online, tell me who the users are in this channel”. Then that new user would get a message from the already active user with a list of the users on the channel. With 1 user this concept is ok’ish for data consumption. With 1000 this would basically be ddos’ing that one new user so a method of consensus would have to be added here to keep it limited somewhat. Regardless, i feel this way is very error prone. It would be easy to have a malicious user sending garbage.
Another way i can think of is, sadly, adding a “fake” trusted user that i control. That fake user also knows whoever is online. New users ask that fake user who the users are. This would work but adds a bit of centralization back in the mix. Exactly what i want to prevent. Though it’s still decoupled centralization. It would be like any other user only one that is always known and trusted.

Do note that i cannot use OrbitDB in this case either. OrbitDB by default marks the creator of the database as the one who can write. That means it has a central single owner who can do writes which means there would need to be a entity somewhere active on that site (a server in the web 2.0 world) that would be in control of managing the database.

I don’t like either of my potential solutions. I’m looking for a trustless way to do this. Including a blockchain - due to transaction costs - is a no-go. Besides, the site would have to wait for the transaction to be propagated and i’m not even sure if a blockchain is a solution at all in this case. Using Ceramic is, i think, also a no-go because someone needs to write updates to a document in there (the user list) and as there is no central authority, every user would need to do that. That smells very error prone and unsafe.

I’m hoping someone else has a brilliantly simple idea of how to do this.

Cheers,
Mark

Discordian · April 10, 2021, 3:07pm

I was thinking of this concept the other day, and what I was going to try is have active users routinely publish a “keep-alive” sort of message. Could include their display name in it too if you want. That way someone new joining within a bit of time would eventually see who’s all online. Additionally if you don’t see a keepalive within a set amount of time, you could remove that user from the active user list.

Hope that idea helps!

markg85 · April 10, 2021, 8:30pm

Hi,

That is an interesting idea!
I am a bit skeptical about the up-to-date’ness of the userlist in this case though.

While it might work, 99% of the time (if not more) you’re sending messages to everyone that nobody needs. It feels a bit brute force

wclayf · April 10, 2021, 9:27pm

To solve the problem of knowing who’s ‘online’ you could just make it where a message from a new peer (at newPeerTime) triggers one of the existing peers to broadcast a list of all known peers and their lastActiveTime, plus X number of the most recent messages from everyone (rev chron list).

This way when someone new joins, it causes a broadcast from some other random peer that gives them everything they need to display a GUI and start participating.

So the question is then which pre-existing peer does this broadcast? Answer: That can be solved by using something like “order by (hashOf(peerId) - hashOf(newPeerTime))” so that all the peers can tell “who” should be the “one” to do the broadcast, by sorting all peers into that ordering, and if you’re at the top of said ordered list you immediately do the broadcast, otherwise you wait 3*N seconds before broadcasting where N is your position in the list, and if someone ahead of you in the order broadcasts then everybody else says “ok, broadcast is done, I don’t need to broadcast”.

Another algorithm which progressively ratchets up the likelihood of a good broadcast would be something like: In the ordered list above, if your position N is N%7==0, you broadcast (every 7th peer). Then if nobody sees that it switches to “if my position N is N%6==0 then broadcast”, then 5, then 4, …so like once you get to 2 that means every other peer will broadcast. This second “modulus-based” algo will be more ideal in situations where there’s large numbers of peers and like 50% to 90% of them disconnecting frequently. The key here is you have some curve that increases asymptotically as time reaches like 10 to 30 seconds, so that most of the time one peer does the broadcast for everyone, but if most go offline then the remaining ones still will find each other.

markg85 · April 11, 2021, 11:40am

That’s an interesting approach, @wclayf!

I thought about this some more and was also considering your first idea but with a little twist to it. I would indeed send a “anyone online, send me a ping” (aka, a broadcast). Then after, say, 1 seconds all “late pings” are disgarded. All pings that responded within 1 second are sorted by time. The fastest one will get the job to send a full userlist. Thus far it’s the same as your approach. The thing i would add (to reduce malicious use lists) is select 5 random users who are tasked to send me the hash of their user list. When i receive them i hash the user list received from the fastest ping and compare it to the 5 hashes received from the random users. From that point on it’s a simple majority consensus. If 3 of those 5 hashes match, the userlist is considered valid.

A hash mismatch here could indicate that not all users have the same userlist. I’m not sure yet how to solve that.

But… funny thing while typing this. I might as well abuse the “anyone online, send me a ping” to respond with who they are in the ping response. That would give me a userlist too.

I have some experimentation to do

wclayf · April 11, 2021, 3:50pm

@markg85 When you say “select 5 ramdom users”, that can be accomplished by sorting users with “order by (hashOf(peerId) - hashOf(newPeerTime))” and then taking the top 5 (or bottom 5, or middle 5, etc). The beauty of that “order by hash delta” is that everyone agrees in advance based on a pure math formula precisely which peers (and in what order) should try to satisfy some network need based purely on the time the need is submitted.

A good system would need to have a reputation score too. That is: each peer needs to have it’s own independent reputation score for all other peers, and this could be defined as something simple like: What’s the ‘diff’ between my user list and peer X’s user list last time I saw X do a broadcast. Score could be something like “userCount/(userCount+diffCount)”. Everyone would use the same rule that if a peer has a score below some threshold, behave as if it doesn’t exist, as long as it’s score remains low.

There could also be some scoring algo that measures performance (responsiveness) from any peer, which would do similar things, like you mentioned @markg85, but you’d want to avoid logic (maybe?) where faster peers end up “owning” the network, at least in untrusted environments.

markg85 · April 11, 2021, 5:20pm

You’re absolutely right, @wclayf ! Thank you for your help and insights.

I keep having my doubts with regards to keeping it secure in a fully distributed manner. But i now at least have a good feeling about that user list.

Time for some experimentation to see how well this works in reality.

Topic		Replies	Views
Discuss theoretical solutions to a fully distributed chat application with me Ecosystem and Usage	9	623	December 15, 2021
Using OrbitDB to create a distributed website with user functionality Help	5	2399	July 13, 2018
Using pubsub as a persistent database Help	5	1400	March 8, 2018
User account Identity and discovery on IPFS Help	4	713	May 3, 2020
Need Assistance Sussing Out Consensus-Based Storage Use Case Ecosystem and Usage use-cases-and-apps	2	733	September 2, 2022

How to maintain a list of users in a fully distributed trustless way?

Related topics