Bitswap efficiency brainstorm

chris · October 18, 2018, 11:37am

Been thinking about this for a bit and now might be as good a time as any to see if any of it is useful to anyone due to the increased need/activity for more efficient data transfers in bitswap I have been seeing around.

##History
A problem that IPFS had in the past was the extreme reliance of all content discovery being routed through DHT. Not only was DHT becoming slow (~1 sec response time/request) it was also becoming a burden on the entire cluster as each request for each block for each file needed to go through many machines and network hops to obtain answers. While DHT scales well the sheer number of requests being fed through the cluster was becoming problematic.

Enters bitswap sessions.

Bitswap sessions in it simplest form first asks all connected peers to send desired blocks if they have them and if after a short timeout no connected peers send the block then the system resorts to DHT to find a peer that can.

This first pass did successfully accomplish its main goal of having a faster content discovery method that no longer relies on DHT but has however come at a massive network efficiency cost through many duplicate blocks being received and high number additional packets being sent and received.

Inherently if you ask 900 peers to provide a block, you should not be surprised when more than one response is received, especially for popular content. A alternate method is required.

##Overview
At first thought my bet for a quick fix may be to simply add a round trip by spamming an availability check, then only send the block wantlist to only one of the respondents. Below I go into more detail about this strategy and a few others.

There is 2 main models for which to gather and use information for content discovery, Gossip and Prediction. (Active and Passive)

Prediction based content discovery

Makes guesses based on passively gathered information and behaviors. It does not create any network traffic of its own and does not require remote peers to be updated/cooperate.
Data can be collected from DHT, Bitswap, Bitswap sessions, and Gossip.
It will generally be difficult to implement correctly and incorrect guesses only waste time and resources, however proper implementations have the potential for massive gains.

Gossip based content discovery

Active in nature and uses additional requests to probe for where content is, requiring remote peer cooperation.
The goal is to create as few lightweight requests as possible to have the ability to bypass expensive and/or time sensitive requests.
Results are based on assertions and not behavior.
Ability to make peers accountable for bad assertions through signed responses.
Improper implementation will lead to more resources being used on background chatter than what is saved, but like prediction can have massive gains if a successful implementation can be created.

##High level overview of strategies

###“Great peers think alike” long term prediction

Primary goal of this module is to extrapolate usage patterns and behaviors.
Peers that often respond with blocks we are requesting may have similar browsing habits and if so will have higher chance to have future blocks.
A first run implementation of this strategy could be a simple hit/miss ratio sorted list.
More extravagant methods can be bolted on to improve accuracy.

###“DAG Awareness” short term prediction

Similar to GPTA however unlike the former that uses relationships between user behavior, this strategy directly relies on the relationship of connected data directly due to how the underlying storage system (mirkle trees/DAG) works.
It appears that a first run implementation is already being worked on in PR #8.
This strategy should excel at short term predictions and require little to no ‘warm up’ in comparison to GPTA. If both predictions are to be used at the same time I would priorities this then check the longer term GPTA.

###“HintFlood” gossip

‘Can you provide’ request blasted out to many peers at once.
Extremely similar to existing bitswap session behavior, except instead of directly asking for blocks we query for availability instead.
Still has the same underlying problem as current bitswap sessions but efficiency is greatly improved as duplicate responses are tiny in comparison (~100 byte packet vs 256kb blocks) at the expense of an extra round trip.
Packets per second(PPS) can still be problematic as you are still blasting ~900 systems at a time per block.
Can be combined with prediction to first only request from a subset of peers (ex. top 50 likely to respond).
IPLD Selectors would greatly improve efficiency, but not strictly required.

###“Hotsheet” gossip

This is a request that can ask nodes what content they have that they believe to be popular and likely requested in the future.
This will allow for eliminating discovery requests if the block your are looking for have been previously listed and cached.
Hotsheets could optionally use truncated/compressed ObjectIDs to reduce payload/memory requirements at the expense of lower accuracy.
IPLD Selectors would greatly improve efficiency, but not strictly required.

###“Hintlist” gossip

Peers that cannot provide, but know who can likely provide blocks respond with a hint object.
You could make Hintlists interoperable with HintFlood and Hotsheets by having HintFlood respond with a full hint object (not just an affirmation) referencing its own PeerID and storing hotsheet responses as hints.
This can be potentially abused so keeping track of peers that offer ‘bad’ hints and ignoring them may need to be in the first implementation.

####Strong positive hint example

{
PeerID: “QmPeerA…”
Object: “QmObjectA…”
properties: {“has”, “all children objects”, “Recursive”, “Pinned”}
timestamp: “1538154322”
signature: “…”
}
“I have this, all of its descendants, and eagerly want to distribute it”

####Weak positive hint example

{
PeerID: “QmPeerA…”
Object: “QmObjectA…”
properties: {“has”, “partal children objects”}
timestamp: “1538154322”
signature: “…”
}
“I have only this block and some blocks related to this”

Full implementations for these strategies will likely involve having them utilize each other which has potential to have some neat systemic results like the previously mentioned HintFlood + Hotsheet + Hintlist combo along with the final module I would like to bring up.

###“HintBroker” module

This one need to be prefaced with the knowledge that properly implementing this one is likely to be a pie in the sky dream. However with that said perhaps someone can use its ideas for something else that is more possible.
HintBroker content discovery combines all other prediction and gossip system together to create a full content discovery module in scale comparable to DHT.
Peers that maintain high up times and large hintlists can become high volume, high accuracy content discovery providers.
These peers effectively become info brokers, treating valid hints as a valuable that can be exchanged with other HintBrokers.
Usage results in a positive feedback loop where random peers utilize it for hints, which gives the hintbroker more info to provide more accurate hints, which brings around more peers asking for hints…
Hintbrokers can optionally also monitor the availability of a blocks and test if assertions are correct and peers are connectable before offering them as responses.

###Side observation

Interestingly it seems she first implementation of bitswap sessions follows a lot of key ideas and issues from the first implementation of pubsub called floodsub with the second gen being a much more efficient variant called wispersub. Perhaps the solution for converting floodsub to wispersub might end up being similar to convert floodswap to wisperswap.

Some refs

github.com/ipfs/go-bitswap

Predict peers most likely to have blocks

opened 09:11PM - 16 Oct 18 UTC

closed 10:45PM - 30 Apr 19 UTC

eingenito

kind/enhancement

**Problem** Right now we don't make the best use of information we have about o…ur historical bitswap interactions with peers when devising a set of peers to ask for blocks. PR #8 has some important background information particularly [this comment](https://github.com/ipfs/go-bitswap/pull/8#issuecomment-418189598). **Proposal** Add behavior to bitswap that tracks received blocks and the peers they came from, which can subsequently be queried for an (ordered?) set of peers deemed the best candidates to ask for a given block.

github.com/ipfs/go-bitswap

[WIP] Work towards improving the duplicate blocks issue

ipfs:master ← ipfs:feat/reduce-dupes

opened 04:48AM - 30 Aug 18 UTC

whyrusleeping

+414 -36

Bitswap currently has a bad problem of getting duplicate blocks. That is, if we …ask for X, and more than one of our peers has X, we will probably get X from each of those peers. Wasting bandwidth. In this branch, we will attempt to fix the problem (or at least, improve it). The first thing i've added here is a test that shows the problem. From there, we can try and get the duplication factor as low as possible.

github.com/ipfs/kubo

Bitswap sessions should rank peers

opened 01:15AM - 17 Nov 17 UTC

closed 09:39PM - 15 May 19 UTC

Stebalien

As a "next step" in bitswap sessions, sessions should rank peers and ask progres…sively more peers for content. There are two parts to this: ## Peer Tracking (ConnManager) We'll probably need to track peer preference similar to how we currently track peer usefulness. Tracking usefulness allows us to decide how much we need the peer, tracking preference would allow us to decide if/when we should use the peer. We'd probably make this some function of: * Latency (given a request for a block, how quickly does the peer reply). * Cost (in the future, some links may cost more; e.g., relays). * Load (we should down weight peers we use a lot). However, for now, we can probably just track latency (with a rolling average?). Note: Technically, we don't have to use the ConnManager. We could use something else. However, IMO, the connmanager is the correct place to track this info. ### Implementation There are probably many ways to go about this but personally, I'd add the following functions to ConnManager: ```go type ConnManager interface { // Register a tag as having importance proportional to `weight` in `property`. RegisterTag(property, tag string, weight int32) UnregisterTag(property, tag string, weight int32) } ``` Then, I'd use these weights when computing the "usefulness" and "preference" of peers. ## Progressive Requests On the bitswap session side, we'd want to progressively ask more and more peers for content, asking the lowest latency peers first.

github.com/ipfs/kubo

Bitswap Sessions

opened 04:37AM - 15 Mar 17 UTC

closed 09:09PM - 14 May 19 UTC

whyrusleeping

kind/enhancement status/in-progress P0

Bitswap is pretty inefficient right now. There is a *lot* we can do to make it b…etter (i would say it would be hard to make it worse, but i know for a fact that statement is false). I've spent a lot of time thinking of ways forward, and have come to a decent (in my opinion) 'next step'. This proposal is to add the concept of sessions to bitswap. The idea expands on an existing optimization we do, where we only find providers for the first block in a `GetMany` request, and make more `FindProviders` calls later if nobody is sending us blocks. When starting a request for a portion of a graph (`ipfs refs -r`, `ipfs pin`, `ipfs cat`, etc) we would create a new session with the hash in question as the session 'root'. Bitswap would then broadcast that ref to every bitswap partner. Any node that responds with the block (even if its a duplicate) gets added to that sessions 'active set'. Further hashes requested that are children of the session root will be broadcasted just to those peers (we can even split up requests amongst peers). If a requested hash isnt received in a given time frame, we start a `FindProviders` call for that hash, and adds the resultant peers to the sessions 'active set' (or perhaps add them to a 'to try set' and to the 'active set' when a block is received from them). Doing things this way will allow us to not spam uninterested bitswap peers with our wantlist updates, and also allow us to reduce duplicate blocks by splitting up request amongst peers we know are likely to respond. Feedback and requests for elaboration will be much appreciated.

a few others that I cannot find anymore…

rklaehn · October 19, 2018, 8:49am

First of all, thanks for asking for external input.

So, as a complete outsider, let me give me a summary and some thoughts.

There needs to be a mechanism to know where what is to avoid having to ask the DHT for every single block. This mechanism should not use excessive bandwidth. The mechanism also does not need to be perfect. Having to go to the DHT for 0.1% of all blocks would not be a big problem.

Hashes are pretty large and impossible to compress. So any mechanism that requires sending around full hashes is probably less than optimal . The mechanism described above provides good compression for some use cases by just sending around the root of a large Merkle DAG. But what if you just know that you want QmABCD, but don’t know that it is a child of merkle dag QmROOT? Then the above won’t help at all.

What does a node to do quickly figure out if it itself has a block to avoid going to disk? Probably some sort of bloom filter, since storing the entire set of hashes would be too expensive. So what if you could somehow gossip the bloom filters to neighbouring nodes?

If you have an incomplete bloom filter, you would get false negatives. Neighbour X has block Y, but the bloom filter says false. No big deal, you just get it from somebody else or ask the DHT.

There will also be false positives with bloom filters. In that case you ask for a block, don’t get it, and then have to go to the DHT. This is a slightly bigger deal since you only learn about the failure after a request times out. You might starting to ask the DHT or somebody else if you don’t get an answer from another candidate peer in a short time.

I don’t have the exact numbers in my head, but bloom filter bitmap of 10 bits per entry is supposedly sufficient for a >1% false positive rate. So for a pretty big node containing 1e9 blocks, the bloom filter bitmap is roughly 1GB. But it contains mostly zero bits, so it should be easy to compress when gossiping. And also, if you have an incomplete bitmap nothing is really broken, you just get a higher false negative rate.

So what about this:

A node stores bloom filter bitmap (or some other more efficient representation) for its current peers and keeps them up to date from some gossip protocol.
When a node wants to retrieve a block, if first checks the bloom filters for all its peers and gets a list of peers that have the block. It then requests the block from each peer that has it according to the bloom filter in a random, staggered fashion. In case nobody has it, it asks the DHT.
When it gains a new peer, it allocates a bloom filter bitmap for it and slowly fills it from gossip

There might be a clever way to encode the bloom filter deltas as hashed ipfs objects themselves. E.g. an update to a bloom filter bitmap could be encoded as a sorted set of offsets of the 1 bits, and then or-ed on the bitmap. But that needs some more thought.

By the way: can we get some stats on a typical large dataset? E.g. how many gigabytes and how many nodes is the wikipedia dataset?

chris · October 19, 2018, 1:58pm

I have never heard of a bloom filter before… How have I never heard of bloom filters before??

These are amazing for just this use case. Being able to represent 100,000 blocks, at a 1% false positive rate requiring only 117 KB, yes please.

0.1 MB filter = 83,464 hashes @ 99% accuracy
1 MB filter = 834,633 hashes @ 99% accuracy
10 MB filter = 8,346,324 hashes @ 99% accuracy

Tool to play around with filter sizes and probabilities
https://hur.st/bloomfilter/?n=&p=0.01&m=1MB&k=

Some constraints like not being able to remove keys from the filter without rebuilding it, but for the most part can be dealt with.

What hashes that go into the filters that get passed around can be varied to match a senarios. (Full maps vs only popular vs new hashes since last filter etc)

Might be time for me to learn some go.

rklaehn · October 19, 2018, 3:26pm

I think this can be generalised.

A node has a set of hashes. To have a cheap mechanism to transfer that knowledge to other nodes, we need a compact representation of a set of hashes that can be partially and incrementally transferred to other nodes. A bloom filter is such a representation, but there might be others.

It is worth noting that while the set of hashes that a node has might be large, there won’t be that much change. If the gossip protocol can take advantage of that, it should be efficient.

Of course you still need local storage for the information about which peer has what, but with bloom filters and other probabilistic data structures you can easily trade between space and accuracy. E.g. make the bitmap smaller, get a higher false positive rate. If you are really storage constrained you just completely ignore the bloom filter gossip.

Topic		Replies	Views
A Question about the algorithm of BitSwap	2	583	October 24, 2018
How to tune a private IPFS swarm for large files? Help kubo	13	517	December 29, 2023
Find Content Faster - Willing to pay anyone who can solve this for me :) Help go-ipfs	16	1263	April 26, 2022
Can i use bitswap for my usecase Help ipld , go-ipfs , libp2p , dht , kubo , use-cases-and-apps	2	36	December 17, 2024
Is Bitswap discovery step worth the wait? Protocol go-ipfs , dht , content-discovery , bitswap	5	521	April 4, 2022

Bitswap efficiency brainstorm

Some refs

Related topics