High duplicate data in bitswap stats. How to debug further?

I’ve been seeing some high bandwidth usage in my ipfs-coord network. I’ve been trying to debug it. I’ve been using the stats.bw() and bitswap.stat() calls. The bitswap.stat() call in particular is showing high duplicate data and block count. This collaborates other data I’ve seen, where the bandwidth consumption is high, but the /blocks directory does not gain much in size.

Here is the output of bitswap.stat():

bitswap stats:  {
  provideBufLen: 0,
  blocksReceived: 9907n,
  wantlist: [
    CID(zdpuAzCjEntqXaNzEZ78u7RauX4PJhxEcrj8HZQM4MQgJNM9T),
    CID(zdpuB2T9dYZKrTRfSZNC6jYLE8v5t4av5ouzwdywkmYZ9UwhC)
  ],
  peers: [
    'QmcewynF2DMxuvK7zk1E5es1cvBwZrfnYEaiN995KVYaKp',
    'Qmbut9Ywz9YEDrz8ySBSgWyJk41Uvm2QJPhwDJzJyGFsD6',
    'QmXbyd4tWzwhGyyZJ9QJctfJJLq7oAJRs39aqpRXUAbu5j',
    'QmXvT9Tn5VbFU4kGobEt8XS2tsyfjohAzYNa4Wmv8P6pzV',
    'QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ',
    'QmZMxNdpMkewiVZLMRxaNxUeZpDUb34pWjZ1kZvsd16Zic',
    'QmXtHADckCmT6jodpAgn3TcDQWjC29gQd2fKKHDTpo8DJT',
    'QmY7JB6MQXhxHvq7dBDh4HpbH29v4yE9JRadAVpndvzySN',
    'QmWPfWgbSjPPFpvmS2QH7NPx14DqxMV8eGAUHLcYfyo1St',
    'QmQk4hMCm5Y2ix7dLzq6ptSP7Y88GK6utSAfgwX5PEw6Ka',
    'QmUTx6KqYKVZbKpKxR7vGDUgZFYVvVVyEWDeCYq4GwBCff',
    'QmXX7hSXUpMoeyFZ3NbWJ8Qb4j9iFDwZWs61qA3oBqfEZz',
    'QmTWd6MfWe3LNSBPRGufnV7pBFX8tWR7Dqs41VJ44Agite',
    'QmZTzUxigeXMZUL4UMAgFNjMcvspS9UvVvWKn7e1KuheX7'
  ],
  dupBlksReceived: 8448n,
  dupDataReceived: 21713653n,
  dataReceived: 25450429n,
  blocksSent: 7n,
  dataSent: 1050n
}

If I’m reading the units correctly, the dupDataReceived is 21.7 megabytes. The total dataReceived is 25.4 megabytes. So duplicate data makes up the vast majority of data.

  • Any suggestions on why this node is consuming so much duplicate data?
  • Any suggestions on how to debug and track down the root cause?

It appears this GitHub Issue touches on this topic. That Issue was closed and diverted to this Issue. But these Issues are concerned with go-ipfs.

I’m wondering when or if these Bitswap improvements were ever pushed to js-ipfs?

I have no idea and have no knowlege about js-ipfs’s bitswap.
However I know legitimate reasons to receive duplicates data, if your want list is very short (have more peers than your want list is big) and are not IO bound, it is strategical to ask the same block from multiple peers, you then keep the first peer that returned you data.

Worded as an example, you want to download the block Qmfoo and have two peers that host the data.
You simply just ask both, keep the first one succeeding. However I know that something go-bitswap tries to avoid doing.

Assuming no bugs, that could happen if your DAG is very linear or if DAG traversal is very shallow (for example a sequential in-order unixfs recursion).

I don’t actually know if js-ipfs does that or not so really don’t take that as answers, hopefully this maybe help you find something.

1 Like

I created this GitHub issue in the js-ipfs repository, to provide a cross-link to this discussion.

I just updated my software to the latest version of js-ipfs:

I’m still seeing duplicate data making up approximately 85% of the data.

This appears to be to be the source of the bandwidth issues that I’ve complained about in previous forum threads. I am making heavy use of pubsub and OrbitDB, which might explain the reasons my nodes are pushing so much redundant data around.

I’ve got to figure out how to reduce this quantity of redundant data. If anyone has suggestions, I’m keen to hear them. Even a 50% reduction from here would make a significant difference in performance.

I also created an Issue in the Orbit-DB repository. Some of the issue might be exacerbated by the use of OrbitDB.