Optimizing Pinned Cluster Content for Fastest Download Speed

shawnp0wers · November 23, 2020, 10:25pm

Hello all – my team is integrating js-ipfs into an app which downloads large files (from a couple GB to 30GB or so) from our IPFS nodes, which are all connected via ipfs-cluster. The files are all pinned and synced, but when we download the CID (top of the merkle tree?) it usually only downloads from a single node.

Is there a way we should be preparing the files apart from the standard “ifps add” method? Does an alternate chunking method allow for bitswap to more efficiently find more peers? We have nodes in several geographical areas, and would like to get some swarming of downloads for speed, but also for redundancy if a node goes down. Since the nodes are all connected and in a cluster together, I assume the client “sees” them all.

Thanks for any insight, even if it’s just “RTFM”, I just haven’t found any information on helping with such things.

Thanks!

hector · November 24, 2020, 10:34am

Hey,

how are you downloading? You say that you are using js-ipfs in an app, IPFS nodes (go-ipfs?) and cluster. Are you running a js-ipfs full ipfs node that connects to those in the cluster?

It may be that js-ipfs bitswap implementation is as optimized as the go-ipfs one, therefore not taking advantange of the optimization that were introduced. It is also important that the getter node is directly connected to all the providers.

Does an alternate chunking method allow for bitswap to more efficiently find more peers? …

You could play with the chunk size (increase to reduce bitswap overhead), but the layout should already be ok for big files like yours (it will result in lots of leafs that can potentially be fetched in parallel).

shawnp0wers · November 24, 2020, 2:19pm

Thanks for the reply, Hector,

We’re downloading using js-ipfs in an app. (Although, we do get the same sort of performance with the go-ipfs app, with the download occasionally multiplexing, but not often)

Would it help to identify (or bootstrap?) a list of our nodes when spinning up the js-ipfs instance? If so, is there a proper way to do that?

Thanks again,
-Shawn

hector · November 24, 2020, 5:52pm

How do you check if things are multiplexing?

Would it help to identify (or bootstrap?) a list of our nodes when spinning up the js-ipfs instance? If so, is there a proper way to do that?

You should definitely ensure that your downloaded is connected to all of the other nodes for the length of its lifetime. go-ipfs has peering config for this. I’m not sure about js-ipfs. Can you check if it does work better with go-ipfs in that case? Honestly, I would expect go-ipfs to work much faster than js-ipfs in terms of bit-swap, particularly in the latest version (I don’t know if the improvements were ported though).

shawnp0wers · November 24, 2020, 9:31pm

We literally watch outgoing bandwidth from our peers. Sometimes (rarely) it pulls from two, but usually it randomly picks a node and downloads the entire file from there.

I’ll look at peering. I’m not sure if it can be done with js-ipfs or not, but I can check the go nodes. We bootstrapped them with each other’s info, but perhaps that different than peering.

reload · November 25, 2020, 10:16am

js-ipfs doesn’t have “peering” i think, go-ipfs implemented it fairly recently. Also, peering is different from bootstrapping, and you only need to pass a PeerID (the DHT is queried if you don’t pass a multiaddr). Not sure if that would solve the non-multiplexing (you shouldn’t have to do any specific config for that …)

meowdada · December 16, 2020, 8:57am

If I understand it correctly, IPFS can boost download speed by avoid downloading duplicated blocks. It implies that a file with high dedup ratio can benefit a lot from this. However, IPFS so far doesn’t seems to support parallel download from multiple peers like BitTorrent.

Following links are the go codes related to ipfs get <path>. You might have interest to check them out.

IPFS CLI get command: https://github.com/ipfs/go-ipfs/blob/master/core/commands/get.go
go-Unixfs: https://github.com/ipfs/go-ipfs/blob/master/core/coreapi/unixfs.go
NewUnixfsFile: https://github.com/ipfs/go-unixfs/blob/master/file/unixfile.go
DagReader: https://github.com/ipfs/go-unixfs/blob/master/io/dagreader.go
ipld.Walker: https://github.com/ipfs/go-ipld-format/blob/master/walker.go

Topic		Replies	Views
Do multiple swarm peers that have pinned the same block cause faster download speeds like in torrenting? Help go-ipfs	7	541	September 28, 2020
Downloading from multiple sources Help	1	726	December 18, 2020
Ipfs download file problem Help	4	1455	September 18, 2018
Can IPFS split large files across multiple nodes? Help	1	1256	May 23, 2017
About the IPFS content file download of getting pieces from multiple computers simultaneously Working Groups & Communities	2	972	December 30, 2020

Optimizing Pinned Cluster Content for Fastest Download Speed

Related topics