Hello all – my team is integrating js-ipfs into an app which downloads large files (from a couple GB to 30GB or so) from our IPFS nodes, which are all connected via ipfs-cluster. The files are all pinned and synced, but when we download the CID (top of the merkle tree?) it usually only downloads from a single node.
Is there a way we should be preparing the files apart from the standard “ifps add” method? Does an alternate chunking method allow for bitswap to more efficiently find more peers? We have nodes in several geographical areas, and would like to get some swarming of downloads for speed, but also for redundancy if a node goes down. Since the nodes are all connected and in a cluster together, I assume the client “sees” them all.
Thanks for any insight, even if it’s just “RTFM”, I just haven’t found any information on helping with such things.
how are you downloading? You say that you are using js-ipfs in an app, IPFS nodes (go-ipfs?) and cluster. Are you running a js-ipfs full ipfs node that connects to those in the cluster?
It may be that js-ipfs bitswap implementation is as optimized as the go-ipfs one, therefore not taking advantange of the optimization that were introduced. It is also important that the getter node is directly connected to all the providers.
Does an alternate chunking method allow for bitswap to more efficiently find more peers? …
You could play with the chunk size (increase to reduce bitswap overhead), but the layout should already be ok for big files like yours (it will result in lots of leafs that can potentially be fetched in parallel).
We’re downloading using js-ipfs in an app. (Although, we do get the same sort of performance with the go-ipfs app, with the download occasionally multiplexing, but not often)
Would it help to identify (or bootstrap?) a list of our nodes when spinning up the js-ipfs instance? If so, is there a proper way to do that?
Would it help to identify (or bootstrap?) a list of our nodes when spinning up the js-ipfs instance? If so, is there a proper way to do that?
You should definitely ensure that your downloaded is connected to all of the other nodes for the length of its lifetime. go-ipfs has peering config for this. I’m not sure about js-ipfs. Can you check if it does work better with go-ipfs in that case? Honestly, I would expect go-ipfs to work much faster than js-ipfs in terms of bit-swap, particularly in the latest version (I don’t know if the improvements were ported though).
We literally watch outgoing bandwidth from our peers. Sometimes (rarely) it pulls from two, but usually it randomly picks a node and downloads the entire file from there.
I’ll look at peering. I’m not sure if it can be done with js-ipfs or not, but I can check the go nodes. We bootstrapped them with each other’s info, but perhaps that different than peering.
js-ipfs doesn’t have “peering” i think, go-ipfs implemented it fairly recently. Also, peering is different from bootstrapping, and you only need to pass a PeerID (the DHT is queried if you don’t pass a multiaddr). Not sure if that would solve the non-multiplexing (you shouldn’t have to do any specific config for that …)
If I understand it correctly, IPFS can boost download speed by avoid downloading duplicated blocks. It implies that a file with high dedup ratio can benefit a lot from this. However, IPFS so far doesn’t seems to support parallel download from multiple peers like BitTorrent.
Following links are the go codes related to ipfs get <path>. You might have interest to check them out.