When the number of Peers increases, data transfer becomes slow

I have a small private swarm network, and use kubo as the first server containing the complete file. All machines are on one private network.

I used boxo to implement a custom client to download files from kubo and act as a full peer node. I’m using this client for file distribution, and I want a 100GiB file that’s only in kubo to be distributed to all clients as fast as possible.

When the size of the peers is less than 60 or so, the speed is quite normal. However, when the number of peers reaches about 150, the speed starts to drop dramatically, and it takes several times longer to transfer the file in its entirety to each peer.

I’ve tried tweaking these configurations to optimize the transfer, but none of them are of much use

  • try bitswap.MaxOutstandingBytesPerPeer 10MB 100MB 512MB 1GB
  • try connmgr.NewConnManager(10, 128) and connmgr.NewConnManager(100, 600)
  • try dag.GetMany to speedup

Did I miss something? I don’t know what to do about the slow distribution problem

Have you tried playing with other bitswap options. Perhaps not as high as this but:

  "Internal": {
    "Bitswap": {
      "EngineBlockstoreWorkerCount": 2500,
      "EngineTaskWorkerCount": 500,
      "MaxOutstandingBytesPerPeer": 1048576,
      "TaskWorkerCount": 500
    }
  }

Keep an eye also on resource manager, in case it is severing connections or something automatically. https://github.com/ipfs/kubo/blob/master/docs/libp2p-resource-management.md

I don’t know how you are downloading, particularly given you say you use dag.GetMany… but DAGService have the concept of sessions. So when downloading a file you should do it within a session so that it prioritizes downloading from the same node rather than performing separate lookups for every block.

Thank you for your suggestion. Sounds like a new direction. I’ll try using the session to see if it improves

session := blockservice.NewSession(ctx, t.peer.peer.BlockService())
nodeGetter := merkledag.WrapSession(session)

I would also try Bitswap’s other options. I’m wondering if bitswap’s scoring strategy (Score Ledger) affects my use case. Since only kubo has this file at first, I want to quickly get this cid file transferred to each peer (each client will actively pull this cid)

I’ve also observed that there are a lot of peers dag get cid timeout. This also appears when there are a large number of peers.

The pseudocode of the client gets the file code:

func (t *Task) get(ctx context.Context, c cid.Cid, node format.Node, offset uint64) (uint64, error) {
	node, err := t.nodeGetter.Get(timeoutCtx, c)

	switch n := node.(type) {
	case *merkledag.ProtoNode:
		links := n.Links()
		linkNodes := make(map[cid.Cid]format.Node, len(links))

		cidSet := cid.NewSet()
		for _, link := range links {
			cidSet.Add(link.Cid)
		}

		// try to get from the dag with 1 minute timeout
		timeoutCtx, cancel := context.WithTimeout(ctx, time.Minute)
		ch := t.nodeGetter.GetMany(timeoutCtx, cidSet.Keys())
		for n := range ch {
			if n.Err == nil {
				linkNodes[n.Node.Cid()] = n.Node
			}
		}
		cancel()

		for _, link := range links {
			node := linkNodes[link.Cid]
			var err error
			offset, err = t.get(ctx, link.Cid, node, offset)
			if err != nil {
				return 0, err
			}
		}
		return offset, nil
	case *merkledag.RawNode:
		data := n.RawData()
		_, err := t.file.WriteAt(data, int64(offset))
		if err != nil {
			return 0, err
		}
		fsn := &posinfo.FilestoreNode{
			Node: node,
			PosInfo: &posinfo.PosInfo{
				Offset:   offset,
				FullPath: path
			},
		}

		// force overwrite to keep the latest node --> file link
		err := t.filestore.FileManager().Put(ctx, fsn)
		if err != nil {
			return 0, err
		}
		return offset + uint64(len(data)), nil
	default:
		return 0, fmt.Errorf("%s: unsupported node type", c.String())
	}
}

Perhaps the original node with the data gets overwhelmed when all others start pulling it from it. Need to make sure that the other nodes can discover data among themselves too (need to be connected etc). Perhaps the network itself with 150 peers in it is maxing out.

One thing that unclear to me is are you connecting all the nodes in a mesh or is everyone hitting 1 single server ?
Because in the mesh case last time I’ve looked at benchmarks the download time is constant no matter how many peoples download the file.

If it’s one server case, I belive you when you say it gets slower than time for 1 download * N nodes our server impl isn’t optimized we wanted to overhaul it but time is sparse [ipfs/go-bitswap] go-bitswap overhaul 2023 · Issue #73 · ipfs/boxo · GitHub.

After a few attempts, none of the above options have done much good, but I appreciate the suggestion. I don’t really think 150 peers is the limit, I’m also guessing that the peers don’t actively try other peers while downloading the file, resulting in all the data being downloaded from the kubo.

They all join the DHT network via kubo as a bootstrap node, and all of them are within the same network.

I’m guessing that the peers don’t actively try other peers on their way to download, resulting in the vast majority of traffic being downloaded from a single kubo, I don’t know exactly how bitswap works, maybe there’s a way to make it possible to get from as many peers as possible. I’ve observed at 50% progress most of the clients have low upload and download, obviously they’re not p2p’ing very well.

It’s worth noting that my blockservice is offline.Exchange and I don’t know if that has any effect on the whole p2p process.

wait what? I would say that yes, that has effect on the p2p process. To the point I’m not sure how it barely works in the first place.

2 Likes