Design idea to solve the scalability problem of a decentralized social media platform using IPFS

@LucaPanofsky Sounds awesome. Can you add me on telegram @estebanabaroa or discord estebanabaroa#2853 ?

I created a decentralized Social Media platform (httpz://quanta.wiki) that has a lot of these same goals in mind, but it currently uses ActivityPub (i.e. part of the Fediverse) as the way for federating social media content. Quanta does have IPFS functionality, but only as an option for how to save attached files (for now)

Making Quanta be “fully IPFS” would only amount to making it save it’s currently MongoDB-based JSON documents onto IPFS as well, and then keeping the MongoDB still as the “searchable cache” (which will always need to exist regardless)…

The idea of a pure IPFS version of something like ActivityPub comes up a lot on this forum as well as ActivityPub forums, and Quanta would be an ideal thing to build a platform like that on top of, since it already does much of the “heavy lifting” (hard to write GUI code, searching, etc).

@wclayf I actually think searching is not part of the core functionality of a social media platform. For example when I use Facebook, Reddit, Twitter, I only spend 0.1% of my time on the platform searching. So I don’t think it should be a concern when designing a p2p social media.

I also think ActivityPub seems problematic because from what I understand, it requires people to host servers and host content directly, which means they are legally liable for content, which means they must spend unbound manhours moderating. It is fundamentally unscalable.

Whereas with my design idea, there are no servers, no one is legally liable for serving or hosting content. I’d love to talk more about it on telegram or discord if you have it.

I agree search is not a core part of social media data. What I meant was that, similar to how IPFS caches data, there needs to be something (at least in my architecture) that can provide instantaneous responses to client/browser requests, and I am still the kinda guy who thinks every web app needs a real database (MySQL, Posgres, etc), even if it’s core source of truth is IPFS. So what I was saying is that even once Quanta goes “full IPFS” on all the social media data, my existing MongoDB will still stay there doing exactly what it’s doing today, providing lightning fast responses to information it already has collected.

For example a request like “Show me a timeline of all my friend’s posts” will always involve a database query.

I always warn people, that as great as IPFS is, it will never perform like your DB and it can never replace the need for a DB, for any “serious” web app.

What we need, in terms of a spec however, is something like the ActivityPub spec, but that’s pure IPFS, and is fully decentralized. AP is Federated (not decentralized), and has many problems like those you pointed out. However AP does have most of the data structures already laid out, and in order to get the Mastodon guys and others to join the IPFS movement, we might as well also adopt like 99% of their data structures (JSON fomats, ActivityStreams, etc), so they don’t have to rewrite their code and so they can “interop” similar to how Matrix allows lots of disparate messaging systems to interop.

I’ll look you up on discord.

I wrote a short whitepaper for my idea:

Plebbit: A serverless, adminless, decentralized Reddit alternative

Abstract

A decentralized social media has 2 problems: How to store the entire world’s data on a blockchain, and how to prevent spam while being feeless. We propose solving the data problem by not using a blockchain, but rather “public key based addressing” and a peer-to-peer pubsub network. A blockchain or even a DAG is unnecessary because unlike cryptocurrencies that must know the order of each transaction to prevent double spends, social media does not care about the order of posts, nor about the availability of old posts. We propose solving the spam problem by having each subplebbit owner run their own “captcha server” and ignore posts that don’t contain a valid captcha challenge answer.

Public key based addressing

In Bittorrent, you have “content based addressing”. The hash of a file becomes its address. With “public key based addressing”, the hash of a public key becomes the address of the subpleddit. Network peers perform a DHT query of this address to retrieve the content of the subpleddit. Each time the content gets updated, the nonce of the content increases. The network only keeps the latest nonce.

Peer-to-peer pubsub

Pubsub is an architecture where you subscribe to a “topic”, like “cats”, then whenever someone publishes a message of topic “cat”, you receive it. A peer-to-peer pubsub network means that anyone can publish, and anyone can subscribe. To publish a post to a subplebbit, a user would publish a message with a “topic” equal to the subplebbit public key (its public key based addressing).

Captcha server

A “captcha server” is a URL that prompts the user to perform a captcha challenge before a post, then sends him a valid signature if completed successfully. The captcha server can decide to prompt all users, first time users only, or no users at all. The captcha server implementation is completely up to the subplebbit owner. He can use 3rd party services like Google captchas.

Lifecycle of creating a subplebbit

  1. Subplebbit owner starts a Plebbit client “node” on his desktop or server. It must be always online to serve content to his users.
  2. He generates a public key pair, which will be the “address” of his subplebbit.
  3. He sets up a captcha server of his choice. It must also be always online to server his users.
  4. He publishes the metadata of his subplebbit to his public key based addressing. This includes subpebblit title, description, rules, list of public keys of moderators, and the captcha server url
    Note: It is possible to delegate running a client and captcha server URL to a centralized service, without providing the private key, which makes user experience easier, without sacrificing decentralization.

Lifecycle of reading the latest posts on a subplebbit

  1. User opens the Plebbit app in a browser or desktop client, and sees an interface similar to Reddit.
  2. His client joins the public key addressing network as a peer and makes a DHT query for each address of each subplebbit he is a member of. The queries each take a several seconds but can be performed concurrently.
  3. The query returns the latest posts of each subplebbit, as well as their metadata such as title, description, moderator list and captcha server URL.
  4. His client arranges the content received in an interface similar to Reddit.

Lifecycle of publishing a post on a subplebbit

  1. User opens the Plebbit app in a browser or desktop client, and sees an interface similar to Reddit.
  2. The app automatically generates a public key pair if the user doesn’t already have one.
  3. He publishes a cat post for a subplebbit called “Cats” with the public key “Y2F0cyA…”
  4. The app makes a call to “Y2F0cyA…” subplebbit’s captcha server. The captcha server optionally decides to send the user a captcha challenge. User completes it and includes the captcha server’s signature with his post.
  5. His client joins the pubsub network for “Y2F0cyA…” and publishes his post.
  6. The subplebbit owner’s client gets notified that the user published to his pubsub, the post is not ignored because it contains his valid captcha server signature.
  7. The subplebbit owner’s client updates the content of his subplebbit’s public key based addressing automatically.
  8. A few minutes later, each user reading the subplebbit receives the update in their app.
  9. If the user’s post violates the subplebbit’s rules, a moderator can delete it, using a similar process the user used to publish.
    Note: Browser users cannot join peer-to-peer networks directly, but they can use an HTTP provider or gateway that relays data for them. This service can exist for free without users having to do or pay anything.

What is a "post"

Post content is not retrieved directly by querying a subplebbit’s public key. What is retrieved is list of “content based addressing” fields. Example: latest post: “bGF0ZXN0…”, metadata: “bWV0YWRhdGE…”. The client will then perform a DHT query to retrieve the content. At least one peer should have the data: the subplebbit’s owner client node. If a subplebbit is popular, many other peers will have it and the load will be distributed, like on Bittorrent.

Peer-to-peer pubsub scalability

A peer-to-peer pubsub network is susceptible to spam and does not scale well. Pubsub peers who spam messages without a valid captcha server signature can be blacklisted. And captcha server urls can be behind DDOS protection services like Cloudflare, so it should be possible for subplebbit owners to resist spam attacks without too much difficulty.

Captcha server lifecycle

  1. The app loads the captcha server URL in an iframe before publishing a post. This URL is operated by each subplebbit owner individually.
  2. The server sends a visual or audio challenge and it appears inside the iframe.
  3. The user completes the challenge and sends his answer back to the server.
  4. If the challenge answer is correct, the server sends back a digital signature for the post.
  5. The user can now include this signature with his post, and when the subplebbit owner encounters that post in the pubsub network, he knows it is not spam.

Conclusion

We believe that the design above would solve the problems of a serverless, adminless decentralized Reddit alternative. It would allow unlimited amounts of subplebbits, users, posts, comments and votes. This is achieved by not caring about the order or availability of old data. It would allow users to post for free using an identical Reddit interface. It would allow subplebbit owners to moderate spam semi-automatically using their own captcha server implementations. It would allow for all features that make Reddit addictive: upvotes, replies, notifications, awards, and a chance to make the “front page”. Finally, it would allow the Plebbit client developers to serve an unlimited amount of users, without any server, legal, advertising or moderation infrastructure.

1 Like

Lifecycle of publishing a post on a subplebbit

1 Like

Checkout pronto. Uses pubsub, IPFS streams, RDF/SparQL and decentralized identities. First UI is done in QML.

You’ve heard the phrase “Protocols over Platforms” right? I coined that, and then Jack Dorsey started using it about his BlueSky project he’s pretending to be working on.

Anyway, the idea is that the world doesn’t need an “implementation” of a new Social Media network. We need a “protocol” for one. The protocol should be so easy a few lines of JavaScript (using only IPFS as any external dependency) can at least perform the minimal capability of reading some messages and posting a message.

Is there a documented protocol for “pronto” somewhere?

Hi @wclayf

Agreed. Pronto doesn’t reinvent the wheel (no time/skills for that), but rather builds upon great standards like RDF, SparQL … and IPFS :slight_smile: The end result is a giant P2P RDF graph easily queryable and extensible. One thing that had to be fixed is that some resources like DIDs change over time and their representation in the graph has to be unique and only the DID owner can “overwrite” the previous triples. That’s been taken care of, and graph upgrades are done “triple by triple” (instead of doing a naive RDF graph merge which could produce duplicates).

Right now documentation is limited … The best way to learn about it ATM is to run galacteek, open the pubsub sniffer and look at the messages on the galacteek.ld.pronto topic. SmartQL IPFS service streams use this protocol name format:

/x/smartql/beta/urn:ipg:g:h0/1.1

The SmartQL service has a /sparql HTTP endpoint to run SparQL queries remotely. Other endpoints are there to pull graphs by resource URI.

Put a few thoughts in his issue, there’s a graph extract this might help to understand.

Here’s a link I already posted in another thread, but it belongs here too:

That guy gets it. He wrote the “simplest possible” PublicKey based social messaging over IPFS PubSub that can realistically exist…and he did it in two days, and with amazingly clean code too.

To build the next Social Network it needs to be that simple at it’s foundation, and with zero-barrier-to entry. That kind of simplicity is what’s needed to get everyone onboard, and build huge momentum, because 100% of developers of all ages will be able to dabble in it successfully.

RDF has it’s place too, as does all the other advanced features, but not as part of the lowest common denominator.

Great project … Thanks for sharing.

In your system each Subplebbit owner would have absolute authority.

Since they are the one anchoring the discussion, it’s not really decentralized.

What is the difference between that and multiple reddit websites controlled by different people?

That’s why one core tenet of IPSM (Interplanetary Social Media) has to be that it’s based on anyone posting anything to any topic, and it’s up to the consumers of the topics to do the filtering. No central authority.

And it immediately follows from that, that they will be pretty big fire hoses of data, so we need to limit to just a PublicKey and CID being posted (into the PubSub). This means just by looking at PublicKeys you can know if you want to discard the CID or not.

I wrote the IPSM spec last week:

…and “Undying Wraith” was able to cobble together a working prototype in two days, proving my point that: “If it’s simple enough then adoption is almost certain.”

That’s already how Reddit works, the owner of each subreddit has absolute authority, so the design is not sacrificing any feature of Reddit. Yet it removes the need for Reddit admins and their entire server and legal infrastructure.

Having absolute authority on your own subplebbit that you created is not any more centralized than having absolute authority on your own coins in your Bitcoin wallet. You own it, it is yours. Unlike on Reddit, where you’re only owning a subreddit you created as long as the admins let you.

Also it is not actually absolute, the client can offer many protection such as verifying that posts are signed by each user, the subplebbit owner cannot temper with a user’s posts other than to delete it, and anyone can keep a “moderation log” by being a peer in the pubsub network and use it to expose a misbehaving subplebbit owner.

I’ve become a decentralization purist… oh no!

The same features but without the central company or website, sign me up! With crypto signature you can prevent impersonation and since your content is immutable you could copy all the content but with a new owner.

Keep us posted on your progress, I would love to integrate your protocol with mine.

Where’s your actual protocol @SionoiS ?

That is, what’s the format for how to post a message and how to read a message, without involving or referencing any of your actual code?

I need to write the specifications! Can one call it a protocol if there’s no specs yet? :man_shrugging:

After getting some feedback I have added 2 new sections:

Censorship resistance of the captcha server

Captcha servers are not as censorship resistant as a purely P2P network, because it requires a direct connection to some HTTP endpoint. If this endpoint is blocked by your ISP or DDOSed, then you can’t connect. These attacks can be mitigated in a few minutes by changing the captcha server URL of your subplebbit, or using DDOS protection like Cloudflare. In a pure P2P network, if some peer is blocked by your ISP or DDOSed, some other peer should be available. A pure P2P captcha server solution seems impossible at this time because requesting a captcha challenge is not deterministic, so how would peers in this network deterministically block a bad peer spamming captcha challenge requests? If a solution for a P2P captcha server is found it should be attempted.

Using anti-spam strategies other than the captcha server

The captcha server can be replaced by other “anti-spam strategies”, such proof of balance of a certain cryptocurrency. For example, a subplebbit owner might require that posts be signed by users holding at least 1 ETH, or at least 1 token of their choice. Another strategy could be a proof of payment, each post must be accompanied by a minimum payment to the owner of the subplebbit. This might be fitting for celebrities wanting to use their subplebbit as a form of “onlyfan”, where fans pay to interact with them. Both these scenarios would not eliminate spam, but they would bring them down from an infinite amount of spam, to an amount that does not overwhelm the pubsub network, and that a group of human moderators can manage. Proof of balance/payment are deterministic so the P2P pubsub network can block spam attacks deterministically. Even more strategies can be added to fit the need of different communities if found, but at this time the captcha server remains the most versatile strategy.

The idea for proof of payment/holding came from @wclayf

I realized that a full captcha challenge request-anwser-validation actually is deterministic, and could work over P2P. If a peer or IP address relays too many captcha challenge requests without enough correct captcha challenge answers, it gets blocked from the pubsub, deterministically. The captcha challenge request alone is not deterministic, but the entire exchange is. This would require the subplebbit owner’s peer to broadcast the result of all captcha challenge answers, and for each peer to keep this information for some time.

So the “captcha server” over HTTP in the original design can be replaced for a “captcha service over peer-to-peer pubsub” design, which would make the entire design of Plebbit peer-to-peer. I will post an update to the entire redesign soon.

1 Like