I’m currently trying to evaluate IPFS for a large-scale content distribution network.
From my understanding of both IPFS and BitTorrent, I’m confident that IPFS should have no problem serving datasets/distribution patterns that BitTorrent currently handles. In other words, I’m confident that IPFS should be able, now, or in the future with optimization and tweaking, to act as a replacement for BitTorrent
However, there are some hypthetical workloads that we don’t see in the wild, and I’m wondering about IPFS’s ability to serve these.
In a BitTorrent swarm, all peers in a swarm are interested in roughly the same data. Thus, the swarm is in a sense “topical”, and the protocol is able to take advantage of this in a few ways. For example, peers are all requesting and sending pieces from the same list, and thus are able to send each other have lists and want lists that consist only of indices. Also, although peers may sometimes not be interested in the same data, the common case is that peers are able to upload to and download from most peers that they connect to.
In IPFS this might not be the case. The IPFS network might be storing extremely large datasets where many peers are only interested in a small subset of data. For example, imagine a ~1PiB dataset and many 10s of thousands of peers, where each peer is only interested in some more-or-less-random ~10-100GiB subset. This would mean that each peer is only interested in 0.1-0.01% of the dataset.
My fears in this scenario are:
Peers might have great difficulty finding other peers to share data with, since another peer might not be interested in the same data.
Peers would have very large have and want lists, and there would be huge overhead sending these lists of hashes back and forth.
The sheer number of blocks involved would mean that peers might not be able to insert themselves into the DHT for every block they have or are interested in. Thus, there might be subtrees of the dataset which peers are interested in, but for which the DHT does not contain any entries.
BitSwap might break down, since most peers that they trade with might never be seen again, or might never be interested in the same data after the initial trade.
Does anyone have any thoughts about these concerns? Has the IPFS network seen instances of this kind of usage?