Design idea to solve the scalability problem of a decentralized social media platform using IPFS

estebanabaroa · September 10, 2021, 4:39pm

Ethereum gives several tools to make a great social media platform: identity/login via public key cryptography, name system via ENS, tipping, token based rewards, token based voting/ranking/curating, etc.

The thing that’s missing is the ability to publish content for free, scale to millions of users, and view this content without spam (spam/sybil resistance).

I think I might have a solution for this that could be used to recreate 99% of the functionalities of a website like reddit using IPFS.

It would use a pubsub gossip protocol (IPFS already has an experimental implementation of this). A gossip protocol allows peers to send messages to all other peers. Peers filter messages they wan’t/don’t want using “topics”. “Topics” are arbitrary strings. In this case they would be an ENS or public key that represent the name of the subreddit.

A user would sign and publish a subreddit thread/comment/vote through an HTTP provider in his browser, the HTTP provider would broadcast the message to the gossip peers using the correct “topic” (subreddit).

The admin/moderators of the subreddit would run a gossip peer that listens to this “topic” (his subreddit name or public key). They can filter spam automatically by requesting the user includes a captcha challenge answer with his thread/comment/vote. The captcha challenge URL would be up to each subreddit admin, most would use Google or Cloudflare captchas, or whatever effective captcha of the time. They can also filter spam manually, like regular subreddit mods.

The admin/moderators would then publish approved posts to a “name system” gossip protocol (IPFS has this, called IPNS). The “name system” is different from pubsub, only the owner of the ENS or public key can publish to the “topic”, instead of everyone. The “topic” is the ENS name or public key instead of an arbitrary string.

A user would view the content of the subreddit in his browser through an HTTP provider, the HTTP provider would query the name system gossip protocol for the name of the subreddit, and send the content to the user. If the user is running a native app, he doesn’t need an HTTP provider, he can be a direct gossip peer. If not, he can choose any HTTP provider, like Ethereum with Infura.

Anyone sees any problem with this design? I am interested in writing a whitepaper for it, if anyone wants to help, let me know (I can fund the initial development). My telegram is @realesteban1 and discord estebanabaroa#2853

LucaPanofsky · September 11, 2021, 9:48am

Hi, I made a similar web app and for the moment I have a (bugged) mock at this link
https://frosty-booth-5f73e0.netlify.app/
if you want to have a look.

Let me summarize how it works.
As you visit the link the app will instantiate an IPFS node in the browser and a bunch of ORBITD db databases.

Peers have a main feed db they use to publish their contents. Whenever a content is published new relational dbs are created in order to manage contents’ relations (likes, rts, replies …).

The feed mechanism works because you can subscribe to orbit databases so to have instant updates about the other peers.

At the moment the social has limited features. In the future I will implement the natural follow relationship and, most importantly, the access controller to standardize contents.

Finally, I do not like having moderation. This social is completely p2p and does not require any central point of control - although we need a star signalling server

I suggest you to have a look at https://orbitdb.org/ I think you may find it very interesting for your project

estebanabaroa · September 11, 2021, 11:18pm

@LucaPanofsky Sounds awesome. Can you add me on telegram @estebanabaroa or discord estebanabaroa#2853 ?

wclayf · September 12, 2021, 1:15am

I created a decentralized Social Media platform (httpz://quanta.wiki) that has a lot of these same goals in mind, but it currently uses ActivityPub (i.e. part of the Fediverse) as the way for federating social media content. Quanta does have IPFS functionality, but only as an option for how to save attached files (for now)

Making Quanta be “fully IPFS” would only amount to making it save it’s currently MongoDB-based JSON documents onto IPFS as well, and then keeping the MongoDB still as the “searchable cache” (which will always need to exist regardless)…

The idea of a pure IPFS version of something like ActivityPub comes up a lot on this forum as well as ActivityPub forums, and Quanta would be an ideal thing to build a platform like that on top of, since it already does much of the “heavy lifting” (hard to write GUI code, searching, etc).

estebanabaroa · September 12, 2021, 1:29am

@wclayf I actually think searching is not part of the core functionality of a social media platform. For example when I use Facebook, Reddit, Twitter, I only spend 0.1% of my time on the platform searching. So I don’t think it should be a concern when designing a p2p social media.

I also think ActivityPub seems problematic because from what I understand, it requires people to host servers and host content directly, which means they are legally liable for content, which means they must spend unbound manhours moderating. It is fundamentally unscalable.

Whereas with my design idea, there are no servers, no one is legally liable for serving or hosting content. I’d love to talk more about it on telegram or discord if you have it.

wclayf · September 12, 2021, 2:21am

I agree search is not a core part of social media data. What I meant was that, similar to how IPFS caches data, there needs to be something (at least in my architecture) that can provide instantaneous responses to client/browser requests, and I am still the kinda guy who thinks every web app needs a real database (MySQL, Posgres, etc), even if it’s core source of truth is IPFS. So what I was saying is that even once Quanta goes “full IPFS” on all the social media data, my existing MongoDB will still stay there doing exactly what it’s doing today, providing lightning fast responses to information it already has collected.

For example a request like “Show me a timeline of all my friend’s posts” will always involve a database query.

I always warn people, that as great as IPFS is, it will never perform like your DB and it can never replace the need for a DB, for any “serious” web app.

What we need, in terms of a spec however, is something like the ActivityPub spec, but that’s pure IPFS, and is fully decentralized. AP is Federated (not decentralized), and has many problems like those you pointed out. However AP does have most of the data structures already laid out, and in order to get the Mastodon guys and others to join the IPFS movement, we might as well also adopt like 99% of their data structures (JSON fomats, ActivityStreams, etc), so they don’t have to rewrite their code and so they can “interop” similar to how Matrix allows lots of disparate messaging systems to interop.

I’ll look you up on discord.

estebanabaroa · September 16, 2021, 12:12am

I wrote a short whitepaper for my idea:

Plebbit: A serverless, adminless, decentralized Reddit alternative

Abstract

A decentralized social media has 2 problems: How to store the entire world’s data on a blockchain, and how to prevent spam while being feeless. We propose solving the data problem by not using a blockchain, but rather “public key based addressing” and a peer-to-peer pubsub network. A blockchain or even a DAG is unnecessary because unlike cryptocurrencies that must know the order of each transaction to prevent double spends, social media does not care about the order of posts, nor about the availability of old posts. We propose solving the spam problem by having each subplebbit owner run their own “captcha server” and ignore posts that don’t contain a valid captcha challenge answer.

Public key based addressing

In Bittorrent, you have “content based addressing”. The hash of a file becomes its address. With “public key based addressing”, the hash of a public key becomes the address of the subpleddit. Network peers perform a DHT query of this address to retrieve the content of the subpleddit. Each time the content gets updated, the nonce of the content increases. The network only keeps the latest nonce.

Peer-to-peer pubsub

Pubsub is an architecture where you subscribe to a “topic”, like “cats”, then whenever someone publishes a message of topic “cat”, you receive it. A peer-to-peer pubsub network means that anyone can publish, and anyone can subscribe. To publish a post to a subplebbit, a user would publish a message with a “topic” equal to the subplebbit public key (its public key based addressing).

Captcha server

A “captcha server” is a URL that prompts the user to perform a captcha challenge before a post, then sends him a valid signature if completed successfully. The captcha server can decide to prompt all users, first time users only, or no users at all. The captcha server implementation is completely up to the subplebbit owner. He can use 3rd party services like Google captchas.

Lifecycle of creating a subplebbit

Subplebbit owner starts a Plebbit client “node” on his desktop or server. It must be always online to serve content to his users.
He generates a public key pair, which will be the “address” of his subplebbit.
He sets up a captcha server of his choice. It must also be always online to server his users.
He publishes the metadata of his subplebbit to his public key based addressing. This includes subpebblit title, description, rules, list of public keys of moderators, and the captcha server url
Note: It is possible to delegate running a client and captcha server URL to a centralized service, without providing the private key, which makes user experience easier, without sacrificing decentralization.

Lifecycle of reading the latest posts on a subplebbit

User opens the Plebbit app in a browser or desktop client, and sees an interface similar to Reddit.
His client joins the public key addressing network as a peer and makes a DHT query for each address of each subplebbit he is a member of. The queries each take a several seconds but can be performed concurrently.
The query returns the latest posts of each subplebbit, as well as their metadata such as title, description, moderator list and captcha server URL.
His client arranges the content received in an interface similar to Reddit.

Lifecycle of publishing a post on a subplebbit

User opens the Plebbit app in a browser or desktop client, and sees an interface similar to Reddit.
The app automatically generates a public key pair if the user doesn’t already have one.
He publishes a cat post for a subplebbit called “Cats” with the public key “Y2F0cyA…”
The app makes a call to “Y2F0cyA…” subplebbit’s captcha server. The captcha server optionally decides to send the user a captcha challenge. User completes it and includes the captcha server’s signature with his post.
His client joins the pubsub network for “Y2F0cyA…” and publishes his post.
The subplebbit owner’s client gets notified that the user published to his pubsub, the post is not ignored because it contains his valid captcha server signature.
The subplebbit owner’s client updates the content of his subplebbit’s public key based addressing automatically.
A few minutes later, each user reading the subplebbit receives the update in their app.
If the user’s post violates the subplebbit’s rules, a moderator can delete it, using a similar process the user used to publish.
Note: Browser users cannot join peer-to-peer networks directly, but they can use an HTTP provider or gateway that relays data for them. This service can exist for free without users having to do or pay anything.

What is a "post"

Post content is not retrieved directly by querying a subplebbit’s public key. What is retrieved is list of “content based addressing” fields. Example: latest post: “bGF0ZXN0…”, metadata: “bWV0YWRhdGE…”. The client will then perform a DHT query to retrieve the content. At least one peer should have the data: the subplebbit’s owner client node. If a subplebbit is popular, many other peers will have it and the load will be distributed, like on Bittorrent.

Peer-to-peer pubsub scalability

A peer-to-peer pubsub network is susceptible to spam and does not scale well. Pubsub peers who spam messages without a valid captcha server signature can be blacklisted. And captcha server urls can be behind DDOS protection services like Cloudflare, so it should be possible for subplebbit owners to resist spam attacks without too much difficulty.

Captcha server lifecycle

The app loads the captcha server URL in an iframe before publishing a post. This URL is operated by each subplebbit owner individually.
The server sends a visual or audio challenge and it appears inside the iframe.
The user completes the challenge and sends his answer back to the server.
If the challenge answer is correct, the server sends back a digital signature for the post.
The user can now include this signature with his post, and when the subplebbit owner encounters that post in the pubsub network, he knows it is not spam.

Conclusion

We believe that the design above would solve the problems of a serverless, adminless decentralized Reddit alternative. It would allow unlimited amounts of subplebbits, users, posts, comments and votes. This is achieved by not caring about the order or availability of old data. It would allow users to post for free using an identical Reddit interface. It would allow subplebbit owners to moderate spam semi-automatically using their own captcha server implementations. It would allow for all features that make Reddit addictive: upvotes, replies, notifications, awards, and a chance to make the “front page”. Finally, it would allow the Plebbit client developers to serve an unlimited amount of users, without any server, legal, advertising or moderation infrastructure.

estebanabaroa · September 16, 2021, 12:13am

Lifecycle of publishing a post on a subplebbit

reload · September 18, 2021, 3:33pm

Checkout pronto. Uses pubsub, IPFS streams, RDF/SparQL and decentralized identities. First UI is done in QML.

wclayf · September 20, 2021, 12:40am

You’ve heard the phrase “Protocols over Platforms” right? I coined that, and then Jack Dorsey started using it about his BlueSky project he’s pretending to be working on.

Anyway, the idea is that the world doesn’t need an “implementation” of a new Social Media network. We need a “protocol” for one. The protocol should be so easy a few lines of JavaScript (using only IPFS as any external dependency) can at least perform the minimal capability of reading some messages and posting a message.

Is there a documented protocol for “pronto” somewhere?

reload · September 20, 2021, 10:32am

Hi @wclayf

Agreed. Pronto doesn’t reinvent the wheel (no time/skills for that), but rather builds upon great standards like RDF, SparQL … and IPFS The end result is a giant P2P RDF graph easily queryable and extensible. One thing that had to be fixed is that some resources like DIDs change over time and their representation in the graph has to be unique and only the DID owner can “overwrite” the previous triples. That’s been taken care of, and graph upgrades are done “triple by triple” (instead of doing a naive RDF graph merge which could produce duplicates).

Right now documentation is limited … The best way to learn about it ATM is to run galacteek, open the pubsub sniffer and look at the messages on the galacteek.ld.pronto topic. SmartQL IPFS service streams use this protocol name format:

/x/smartql/beta/urn:ipg:g:h0/1.1

The SmartQL service has a /sparql HTTP endpoint to run SparQL queries remotely. Other endpoints are there to pull graphs by resource URI.

reload · September 20, 2021, 12:12pm

Put a few thoughts in his issue, there’s a graph extract this might help to understand.

wclayf · September 20, 2021, 11:05pm

Here’s a link I already posted in another thread, but it belongs here too:

That guy gets it. He wrote the “simplest possible” PublicKey based social messaging over IPFS PubSub that can realistically exist…and he did it in two days, and with amazingly clean code too.

To build the next Social Network it needs to be that simple at it’s foundation, and with zero-barrier-to entry. That kind of simplicity is what’s needed to get everyone onboard, and build huge momentum, because 100% of developers of all ages will be able to dabble in it successfully.

RDF has it’s place too, as does all the other advanced features, but not as part of the lowest common denominator.

reload · September 21, 2021, 9:36am

Great project … Thanks for sharing.

SionoiS · September 21, 2021, 1:04pm

In your system each Subplebbit owner would have absolute authority.

Since they are the one anchoring the discussion, it’s not really decentralized.

What is the difference between that and multiple reddit websites controlled by different people?

wclayf · September 21, 2021, 3:20pm

That’s why one core tenet of IPSM (Interplanetary Social Media) has to be that it’s based on anyone posting anything to any topic, and it’s up to the consumers of the topics to do the filtering. No central authority.

And it immediately follows from that, that they will be pretty big fire hoses of data, so we need to limit to just a PublicKey and CID being posted (into the PubSub). This means just by looking at PublicKeys you can know if you want to discard the CID or not.

I wrote the IPSM spec last week:

…and “Undying Wraith” was able to cobble together a working prototype in two days, proving my point that: “If it’s simple enough then adoption is almost certain.”

estebanabaroa · September 21, 2021, 3:50pm

That’s already how Reddit works, the owner of each subreddit has absolute authority, so the design is not sacrificing any feature of Reddit. Yet it removes the need for Reddit admins and their entire server and legal infrastructure.

Having absolute authority on your own subplebbit that you created is not any more centralized than having absolute authority on your own coins in your Bitcoin wallet. You own it, it is yours. Unlike on Reddit, where you’re only owning a subreddit you created as long as the admins let you.

Also it is not actually absolute, the client can offer many protection such as verifying that posts are signed by each user, the subplebbit owner cannot temper with a user’s posts other than to delete it, and anyone can keep a “moderation log” by being a peer in the pubsub network and use it to expose a misbehaving subplebbit owner.

SionoiS · September 22, 2021, 11:59am

I’ve become a decentralization purist… oh no!

The same features but without the central company or website, sign me up! With crypto signature you can prevent impersonation and since your content is immutable you could copy all the content but with a new owner.

Keep us posted on your progress, I would love to integrate your protocol with mine.

wclayf · September 22, 2021, 7:13pm

Where’s your actual protocol @SionoiS ?

That is, what’s the format for how to post a message and how to read a message, without involving or referencing any of your actual code?

SionoiS · September 23, 2021, 12:06pm

I need to write the specifications! Can one call it a protocol if there’s no specs yet?

Topic		Replies	Views
A idea about decentralized social meadia Protocol Ecosystem and Usage	20	1399	February 24, 2021
Social Media Architecture with IPFS Ecosystem and Usage use-cases-and-apps	118	15103	January 3, 2023
Blueprint of a distributed social network on IPFS (2) [blog article]	10	2098	January 25, 2019
PubSub, some questions and potential spam flooding Help	7	1171	March 20, 2020
Questions after first learning about IPFS Help	12	1673	May 23, 2017

Design idea to solve the scalability problem of a decentralized social media platform using IPFS

Related topics