Build a private distribution network using ipfs-cluster

pourserv · August 16, 2020, 3:20am

Hi, there. Appreciation to all the lovely people who make this possible.

We’re planning to build a private network which allows us to duplicate files over. These are the constraints about this network:

“private” here means we don’t want outside parties to join and we don’t want content to be available publicly even outsiders have the CIDs
there should be only one “conductor” peer who can sort of control the network:
- add/remove files from the network
- control which peers get what
there will be a couple thousand peers in the network
one or more files (~200Gb each) will be distributed periodically and we need the duplication throughout the network to be done in a timely manner (hard requirement)

After some investigation, ipfs-cluster has emerged as a very promising choice. But there’re a still few questions that we need answers so that we could make the architectural decisions

how exactly do ipfs/ipfs-cluster duplicate content over the network?
is there anything we can do on the ipfs/ipfs-cluster level to speed up the duplication over the network? like managing the distribution strategy

realChainLife · August 16, 2020, 4:02pm

Hey @pourserv,

We use the ipfs-cluster-ctl add command which is very similar to ipfs add to add objects to the cluster network i.e. the cluster peers at the same time. How many peers an object is added to depends on replication factors you set as command flags pin or the defaults in the configuration file.
To pin CIDs to your cluster peers, use the ipfs-cluster-ctl pin add operation which is also similar to the ipfs pin add only difference here is you’ll set Cluster-specific flags, such the replication factors or the name associated to a pin.
On whether you can speed up duplication on the network, you need to determine the preferred consensus component for your network. With CRDT for example, you get faster pin ingestion which is relatively slower on RAFT.

pourserv · August 16, 2020, 5:00pm

Hi, @realChainLife. Thanks for all the advices. I guess the ultimate question I wanna ask is is it possible to approximately calculate how much time does it take to distribute a file throughout the network, given the size of the file, the amount of peers in the private network, the average up/down stream bandwidth. We don’t know how to coz we don’t understand how exactly does the duplication work, like how smart does it work:

is the distribution strategy hard-coded or configurable (beside the replication factors)?
will the a peer duplicate different parts of the file to different peers at once?
will the peers start to share data to other peers after they get the file as a whole or after they only have some blocks in hand?

twdragon · August 17, 2020, 8:16am

@pourserv please, could you create the topic with detailed explanation of your CID protection? It would be very interesting for scientific purposes and protection of the data with low and medium levels of confidentiality. Maybe, we could present a tutorial about.

hector · August 17, 2020, 11:51am

Hi, while ipfs-cluster can add content directly (simultaneously to several IPFS peers), this is significantly slower than letting IPFS distribute the content through the standard bitswap mechanism.

Your workflow would be to:

ipfs add ... OR ipfs-cluster-ctl add --local ... to your conductor peer.
ipfs-cluster-ctl pin add ... (if you used ipfs add, otherwise this happens automatically).

What ipfs-cluster will do is to tell every peer to pin the CID. From that point, bitswap comes into play and blocks will be transferred around as best as IPFS can.

Regarding how long it will take: you will need to test specifically, but New improvements to IPFS Bitswap for faster container image distribution | IPFS Blog & News might be related.

It will depend on the stability and bandwidth of your peers, and the shape of the DAG that holds the file, but once it starts flowing out of one, bitswap should be pretty clever at getting blocks from different places normally.

Hope that helps!

pourserv · August 20, 2020, 5:21am

Thanks! @hector That’s exactly what we’re looking for.

hector · August 20, 2020, 1:19pm

I want to add that if content is < 256KB (single-block, using raw-leaves) or very few blocks it might be just faster for cluster to distribute it.

karungokihara · August 27, 2020, 12:21pm

I am a newbie in go-ipfs. With same issue as @pourserv , Thanks for the replies you gave him

Topic		Replies	Views
Is there any provision to choose ipfs-cluster peer for content replication? IPFS Cluster ipfs-cluster	26	2007	September 22, 2020
Scalable IPFS cluster based private network setup Help go-ipfs , ipfs-cluster	1	364	July 5, 2023
Help Understand core concepts of IPFS Node and IPFS Cluster Help	13	1355	June 18, 2021
How IPFS backs up data Help go-ipfs	9	700	March 12, 2020
Private network with Cluster IPFS Cluster	0	477	March 13, 2020

Build a private distribution network using ipfs-cluster

Related topics