We are planning to use IPFS for our production systems. But we are worried about DDOS attacks that can result from the same peer being made to serve multiple requests for some content it hosts. For example, if peer A and peer B both have content C, what happens if all requests for C are always routed to A ? Do we have a method in place to load balance this ?
In other words, if our DHT maps key C to values A & B, how does the peer decide whether to send the request to A or B ?
I had a look at the bitswap specification and the IPFS white paper, but neither makes any mention of this.
The answer is a bit tricky depending on the pre-existing connections between hosts asking for content and the peers that already have that content. As a baseline if both A and B have indicated in the DHT that they provide the same content, and the consuming node is not already connected to either A or B, the current go-ipfs bitswap implementation will likely ask them both for the content. There are changes underway to this behavior to reduce the number of duplicate blocks requested and transmitted.
To more directly answer your question (I think) right now nothing prevents custom code from making as many requests for blocks to a remote bitswap host as it cares to (or the network allows) regardless of who says they have the content in the DHT. So there is no load balancing, or backoff smarts, at the individual node level. If someone wants to saturate node A with requests for blocks of content they can do so. In this regard it’s like an individual HTTP server.
@eingenito Thank you so much for the answer. That makes sense.
I had an in depth look at the code. What I can surmise is that even if the consuming node is connected to A and B, it will still ask them both for content as the search for a node concludes only after it has discovered 3 providers or has exhausted the search space of ‘closet to the key’ nodes. I think this behaviour makes sense for resiliency/performance. If you discover and send a request to only one node, that might fail and you have to retry. So, 3 sounds like a good number. Please can you explain what changes are underway to this behaviour ?
The implementation is actually pretty complicated and somewhat split currently between bitswap without sessions and bitswap with sessions. First it’s worth noting that bitswap finds new providers in the background while already sending want requests to existing connected peers. It looks for more providers when the current set of connected peers stop returning requested blocks. The number 3 (or 10 for bitswap sessions) is the number of providers to add to an existing set of peers that bitswap sends wants to, not the total number of nodes bitswap will make requests to. In the current implementation there are lots of situations where bitswap will request a block from all your connected peers which can number in the hundreds easily. One of the problems right now is that in many situation bitswap will request data from too many nodes and will get swamped with duplicate responses, choking data transfers.
That’s the behavior that is partially addressed by bitswap sessions, and that’s the concept that is currently being worked on. Making sessions more efficient; meaning reducing the number of duplicate blocks that bitswap fetches.