Build a private distribution network using ipfs-cluster

Hi, there. Appreciation to all the lovely people who make this possible.

We’re planning to build a private network which allows us to duplicate files over. These are the constraints about this network:

  • “private” here means we don’t want outside parties to join and we don’t want content to be available publicly even outsiders have the CIDs
  • there should be only one “conductor” peer who can sort of control the network:
    • add/remove files from the network
    • control which peers get what
  • there will be a couple thousand peers in the network
  • one or more files (~200Gb each) will be distributed periodically and we need the duplication throughout the network to be done in a timely manner (hard requirement)

After some investigation, ipfs-cluster has emerged as a very promising choice. But there’re a still few questions that we need answers so that we could make the architectural decisions

  • how exactly do ipfs/ipfs-cluster duplicate content over the network?
  • is there anything we can do on the ipfs/ipfs-cluster level to speed up the duplication over the network? like managing the distribution strategy
1 Like

Hey @pourserv,

  • We use the ipfs-cluster-ctl add command which is very similar to ipfs add to add objects to the cluster network i.e. the cluster peers at the same time. How many peers an object is added to depends on replication factors you set as command flags pin or the defaults in the configuration file.

  • To pin CIDs to your cluster peers, use the ipfs-cluster-ctl pin add operation which is also similar to the ipfs pin add only difference here is you’ll set Cluster-specific flags, such the replication factors or the name associated to a pin.

  • On whether you can speed up duplication on the network, you need to determine the preferred consensus component for your network. With CRDT for example, you get faster pin ingestion which is relatively slower on RAFT.

Hi, @realChainLife. Thanks for all the advices. I guess the ultimate question I wanna ask is is it possible to approximately calculate how much time does it take to distribute a file throughout the network, given the size of the file, the amount of peers in the private network, the average up/down stream bandwidth. We don’t know how to coz we don’t understand how exactly does the duplication work, like how smart does it work:

  • is the distribution strategy hard-coded or configurable (beside the replication factors)?
  • will the a peer duplicate different parts of the file to different peers at once?
  • will the peers start to share data to other peers after they get the file as a whole or after they only have some blocks in hand?

@pourserv please, could you create the topic with detailed explanation of your CID protection? It would be very interesting for scientific purposes and protection of the data with low and medium levels of confidentiality. Maybe, we could present a tutorial about.

Hi, while ipfs-cluster can add content directly (simultaneously to several IPFS peers), this is significantly slower than letting IPFS distribute the content through the standard bitswap mechanism.

Your workflow would be to:

  • ipfs add ... OR ipfs-cluster-ctl add --local ... to your conductor peer.
  • ipfs-cluster-ctl pin add ... (if you used ipfs add, otherwise this happens automatically).

What ipfs-cluster will do is to tell every peer to pin the CID. From that point, bitswap comes into play and blocks will be transferred around as best as IPFS can.

Regarding how long it will take: you will need to test specifically, but New improvements to IPFS Bitswap for faster container image distribution | IPFS Blog & News might be related.

It will depend on the stability and bandwidth of your peers, and the shape of the DAG that holds the file, but once it starts flowing out of one, bitswap should be pretty clever at getting blocks from different places normally.

Hope that helps!

Thanks! @hector That’s exactly what we’re looking for.

I want to add that if content is < 256KB (single-block, using raw-leaves) or very few blocks it might be just faster for cluster to distribute it.

I am a newbie in go-ipfs. With same issue as @pourserv , Thanks for the replies you gave him