IPFS Cluster vs IPFS Private Network : Data distribution

talha · September 28, 2020, 9:11am

I have referred the earlier discussions between Private IPFS Network vs IPFS Cluster but my confusion still exists.
When to use what?
Lets say i have a 400 KB file and while storing in IPFS i want to distribute it equally among 4 nodes so that each node will have 100KB data.
So this problem statement will be implemented by Private IPFS Network or by IPFS Cluster?
And in either case, what will happen if 1 node goes down?

hector · September 28, 2020, 10:38am

As of today, it is not possible to do this in IPFS or in IPFS Cluster.

Note that cluster works as companion to IPFS. It does not matter if IPFS is configured for a private network or part of the public one.

talha · September 28, 2020, 11:53am

Ok.

So does it mean that each node will have 400KB data ?

ldeffenb · September 28, 2020, 1:00pm

And in either case, what will happen if 1 node goes down?

That one’s easy. If you actually manage to only give 1/4 of the file to each of 4 nodes and one of them goes down, then the file will not be fully accessible.

hector · September 28, 2020, 4:38pm

Yes, correct. All will have the full file.

talha · September 29, 2020, 5:04am

I am still confused.
So you mean to say that If I have a 4 node IPFS Cluster and i upload a file, then the cluster in total will have 4 copies of the file, 1 in each node?
When we talk about replication in IPFS, do we mean this only?

yiannis · September 29, 2020, 6:12am

So you mean to say that If I have a 4 node IPFS Cluster and i upload a file, then the cluster in total will have 4 copies of the file, 1 in each node?

Yes, this is what is happening in IPFS Cluster.

When we talk about replication in IPFS, do we mean this only?

It might be worth clarifying replication a little further. There is generally “proactive replication”, where a file is replicated automatically to the storage space of other nodes (e.g., upon adding a file to the network) - this is what is happening in IPFS Cluster, but not in IPFS. And then there is “reactive replication” where some file is replicated to a node’s storage, only after this node has requested the file. This is also referred to as caching and is what IPFS does.

BTW, it would be really cool to have what you initially asked. A great way of achieving that is through erasure coding. It would certainly help with large files and nodes going offline and it could work both in IPFS Cluster and in IPFS (where coded content is only in nodes that have requested the file before).

talha · September 29, 2020, 6:22am

What if I don’t want to replicate a file in IPFS Cluster? Will IPFS Cluster be useful then?
And if I have a 4 node Private IPFS Network, then data wont be replicated right?..In that case if 1 node goes down, then the files uploaded through that ipfs node will still be accessible to the other nodes or not?

hector · September 29, 2020, 10:03am

Data will only be accessible if there is a node online that can provide it because it has stored it before.

Note that cluster allows to set a replication factor for every pinned item, so you can replicate 2,3 or 4 times, or not do it at all.

talha · September 29, 2020, 10:11am

Ok. I am getting some clarity now.

So this means that in a 4 node private ipfs network, if I upload a file using node1 and then retrieve it using node2, the file will be stored in both node1 and node2. Right?

ldeffenb · September 29, 2020, 12:59pm

It will only be in node 2 until it might be garbage collected. Node 2 is not guaranteed to keep it unless it is pinned there, either explicitly or via ipfs-cluster. It’s actually only guaranteed to stay in Node 1 if it is pinned there, but I think uploading (might?) automatically pin in the node to which it was upload.

talha · September 30, 2020, 7:32am

Ok. So lets say we have an IPFS Cluster where i am not replicating any data…Then in general terms, will the IPFS cluster and Private IPFS network will be same?

@hector Can you please clarify on my earlier doubts also.

Actually i working on a production grade project where i have a Kubernetes cluster with multiple worker nodes and for data storage i have to use IPFS multinode setup either through IPFS CLuster or Private network but it will be running inside the kubernetes cluster.
The concern here is that i want a High Available IPFS setup which should be scalable horizontally as we will be storing some 2-3 petabyte data.
I am thinking of having a 2 node IPFS Cluster initially with replication so that if 1 node goes down, the other will be available and later when storage is getting full, i can add 2 more node with replication and so on.

Is this type of solution feasible with IPFS Cluster?
Earlier i thought of using Private IPFS network but in that case will i be able to ensure some level of high availability?

Can you clarify on the above doubt also?

hector · September 30, 2020, 8:57am

IPFS nodes will store/advertise content that has been added to them or that they have retrieved. As mentioned above, they can also remove the content from themselves by running a garbage collection, when the content is not pinned, but this does not happen automatically unless configured.

Yes, you can have a 2 peer cluster with replication-factor=2 and then increase the number of peers but keep the replication-factor to 2. Cluster will pin content to the nodes with most storage available.

Private IPFS network is not really related to redundancy or availability. A private IPFS network is simply not part of the public IPFS network and nodes in the public network cannot connect to it nor retrieve any content from it.

Topic		Replies	Views
IPFS file replication on all nodes in Private network	6	1821	September 26, 2017
IPFS Private Network and data Replication (Private Network against IPFS Cluster) Help js-ipfs , ipfs-cluster	4	1580	June 19, 2019
Scalable IPFS cluster based private network setup Help go-ipfs , ipfs-cluster	1	364	July 5, 2023
How IPFS backs up data Help go-ipfs	9	700	March 12, 2020
Ipfs cluster general queries IPFS Cluster	1	430	July 14, 2020

IPFS Cluster vs IPFS Private Network : Data distribution

Related topics