PubSub, some questions and potential spam flooding

markg85 · March 17, 2020, 11:09pm

Hi,

I’m working on a little IPFS project where all nodes need to know of each other if they are online. This is without maintaining node lists.

My initial thought was a kind of chatroom application. But upon further experimenting with pubsub it seems like there is no pubsub default of broadcasting your presence, is that correct?

I cannot use ipfs pubsub peers(topic) as you get no signal or whatsoever to know that a new peer is subscribed to a topic. If you were to use this, you were to be polling it to detect changes. Not ideal.

I come from some socket.io experiments from years ago where you joining a “room” was a new connection that you needed to handle and thus knew someone joined hence i was assuming the same to be true here. But IPFS doesn’t emit any signals (as far as i know) when one joins a topic (subscribes). It should be possible to add that to the protocol as a new peer for a topic is added to ipfs pubsub peers just no notification of it exists. It’s not a big issue at all, just sending a publish message with some identifiable payload does the trick in this case.

What’s with the topicIDs array?
So each message that flows in has a topicIDs array. What’s the rationale in having that?
The message is already send to a specific topic from the sending side and the receiving side is already subscribed to a specific topic. I’m sure there is a really good reasoning, i just don’t see it yet Should i use this array? Just curious to know about this one.

How should large payloads be send?
By large i mean, at most, multi megabytes. But i wonder if a logic of always having a CID as payload and using plain IPFS to get the data is the way to go be default? The payload would then always be as large the CID is in characters. The cost here would be added delays due to getting content from IPFS. The alternative would be to just use pubsub for the data. Any recommendation here on what is still sane for pubsub in terms of payload size?

Potential spam flooding
So this is a bigger issue. As far as i know anyone can send anything to any topic. How is an app using this mechanism supposed to filter out the spam? To give an example of how a case where it could be abused. Imagine if there is a service (could be a chat site even!) out there using pubsub for it’s main operation. Now imagine there to be a “mad person wanting to take that site down”… All that person has to do now is write a dead simple bash script with a loop to completely flood a given topic. The receiving side will have to handle every message to determine if it is a proper message for it’s service (disgard if not, use if it is). That handling when flooded with millions of messages will likely kill a service. I haven’t tried this but i assume it would work.

That begs the question: how can we use pubsub and somewhat safely assume a topic won’t be the victim of a bad actor?

I’ve been trying to think of a way to prevent flood protection. Like actually blocking bad actors. But i can’t think of any mechanism (yet) that would allow detection and keep the API as simple and clean as it is now. All i can come up with is methods of “registering” a topic and defining rules for posting in that topic. But that smiles like one node being the “master” which imho kinda defeats the point of being decentralized. Unless you could define the rules for a topic in a public way, like storing the rules on IPNS. If everyone can still see the rules, but only some can obey them… That could work. That would require multiple API changes though.

Cheers,
Mark

hector · March 19, 2020, 10:22am

Hi!

Pubsub is actually libp2p land so these may be better asked in https://discuss.libp2p.io/. I can tell you that the spec pubsub spec (https://github.com/libp2p/specs/tree/master/pubsub#the-message) specifies that topicIDs carries the topics that a message is published to.

I am not sure what you understand by “subscribed” . Peers have connections with multiple protocol-streams in them (one of those being the pubsub protocol). Peers will receive messages and only process them if they are subscribed to something in topicIDs (my guess).

By the way, I don’t want to miss the chance to mention the awesome pubsub docs even if you saw them already: /concepts/pubsub/overview/

There is this constant that gives you a soft upper limit to anything you want to do on libp2p and I don’t know if pubsub enforces anything else:

The best pubsub message payload size will however depend on your applications and the peers that are subscribed to your topics and how fast you want information to flow and other constraints specific to your app. Publishing CIDs as you mention is one way but does not have to be the only one.

I think the general approach is to only accept message from trusted peers and drop everything else. Since libp2p has identity baked in, and pubsub supports message signatures, you can keep a list of “well-behaved peers” and drop messages that are not signed by those. You can use the same to do a blacklist approach (i.e. when flooding is detected).

markg85 · March 19, 2020, 10:53am

Really?
I’m talking about the pubsub commands in IPFS. It smells like i should be here
Anyhow, i still don’t get the topicIDs as what you said suggests you could publish a message to a range of topics. However, the IPFS commands give no indication of that being possible: https://docs.ipfs.io/reference/api/cli/#ipfs-pubsub-pub I would get it if that were to be the case though but nothing indicates it is.

Ohh, that’s looking fancy! Reading that right after this post! Thank you for the pointer!

Would it be possible to describe this on the place where one would look for information? That being https://docs.ipfs.io/reference/api/cli/#ipfs-pubsub-pub when using pubsub in IPFS. That link you pint me to is really not something you would be able to find easily otherwise. I’d describe it in IPFS as just some advised way of using it and a note on limitations to take into account.

How does one do that?
I don’t see any method in the ipfs pubsub commands to do some sort of verification. Should i maintain a list of “trusted CID’s” and just filter based on that? I did enable message signing but am not sure at all how to actually use it.

hector · March 19, 2020, 11:11am

Maybe IPFS as a frontend does not support it, but the Pubsub protocol does. Pubsub is an experimental feature on IPFS (also, it has little to do with IPFS itself as a filesystem).

It would be possible, but first this would require figuring out what the actual enforced limit is, which I am not sure.

How does one do that?
I don’t see any method in the ipfs pubsub commands to do some sort of verification. Should i maintain a list of “trusted CID’s” and just filter based on that? I did enable message signing but am not sure at all how to actually use it.

My two cents here is that you can’t expect to build a pubsub app based solely on the IPFS API which offers very rough and experimental interface to it. You need to build a libp2p peer manually and integrate with it, enabling the features that you want from the protocol.

markg85 · March 19, 2020, 12:24pm

I understand that pubsub is experimental in IPFS, but suggesting to create a libp2p app to use the pubsub functionality kinda makes developing anything with IPFS and pubsub orders of magnitude more complicated. Now there might be an advantage there too. IPFS is kinda heavy on resource usage so going the libp2p route in that regard is an interesting idea too. But than how would i use actual IPFS features (like getting files or pinning ones). I’m guessing i can’t use libp2p there? And then again, if i can use libp2p then i’d basically be making a very lightweight IPFS… That can’t possible be the intended way to use IPFS in third party applications, can it?

You’re making me more confused and giving me more questions then answers here, lol.

I want to use IPFS features (like in getting files) and use pubsub to synchronize state between nodes.

Now, having concerns about potential bad actors flooding a topic is just that, a concern. But i don’t think it’s an actual issue “at the moment”. Perhaps also because it’s still very much experimental.

Lets not forget that IPNS in the next major IPFS release (0.50.0 i think) is going to use pubsub too. So based on that i was assuming the pubsub to be experimental but definitely usable. Not in a state of alpha, which is the impression i get based on your replies.

hector · March 19, 2020, 1:42pm

What I mean is that having an ipfs pubsub interface is not something that IPFS needs per se for the functioning of the filesystem. Sure IPFS might use pubsub internally but other than that ipfs pubsub is provided to the outside as a goodie more than a core feature, while the pubsub protocol in libp2p is pretty mature itself.

I understand that pubsub is experimental in IPFS, but suggesting to create a libp2p app to use the pubsub functionality kinda makes developing anything with IPFS and pubsub orders of magnitude more complicated.

Yes and no… it depends on the requirements. If the current functionality does not allow you to develop such app at all then it’s worse than complex. The IPFS and libp2p ecosystem provide the building blocks to build end-user apps which are optimized for whatever they need to be. We cannot optimize go-ipfs for every possible libp2p use case, but the pieces that make it are very customizable.

go-ipfs can be used as a library: https://github.com/ipfs/go-ipfs/tree/master/docs/examples/go-ipfs-as-a-library . I also wrote ipfs-lite (GitHub - hsanjuan/ipfs-lite: IPFS-Lite is an embeddable, lightweight IPFS-network peer for IPLD applications), so that I could build something similar to what you are describing. Textile provides some extra building blocks on top of IPFS that can be very useful. So does OrbitDB etc.

I wish things were different but there are 1000x ways of doing mix+match between p2p blocks and their configs, and go-ipfs as a program itself is focused in providing a distributed data storage layer rather than in offering a fully featured API to libp2p functionality, even though it tries to give a peek through experimental features etc.

markg85 · March 19, 2020, 10:05pm

Agreed. But having pubsub (or rather, a mechanism to “talk” between peers) is a quite important basic building block to have. It’s a task that en the regular internet world a server takes care of. But we don’t have servers in the IPFS world so we need something suitable to replace it.

I’m really glad IPFS provides this by means of pubsub and i think it’s quite usable in it’s current state

I do wonder with libp2p, if one were to make an app with that (i’m not! not for now at least) would it be able to connect the the existing pubsub nodes in IPFS? Or would you basically have to setup your own network of nodes?

We should not make things too complicated here either. Sure, having separated libraries is beneficial to code maintainability on the library side but servery complicates the side that want to use it. Now for “apps” (not websites) this too might be workable. But for websites this is needlessly complicated and causing uber large client side javascript libraries to make it workable. The fix there would be browser adoption (like in brave i think?) but that can take many years…

hector · March 20, 2020, 8:53am

You’d be able to connect to the main IPFS network

Progress is slow, but there is some progress

Topic		Replies	Views
Use IPFS PubSub in a website Ecosystem and Usage	7	1206	November 20, 2020
When will IPFS pubsub be scaleable	6	2201	September 4, 2018
Libp2p pubsub vs IPFS pubsub js-ipfs	2	659	December 11, 2021
State of pubsub Help go-ipfs	9	2020	March 29, 2018
The future of PubSub Ecosystem and Usage js-ipfs , go-ipfs	1	339	July 7, 2022

PubSub, some questions and potential spam flooding

Related topics