IPFS Cluster vs IPFS Private Network : Data distribution

I have referred the earlier discussions between Private IPFS Network vs IPFS Cluster but my confusion still exists.
When to use what?
Lets say i have a 400 KB file and while storing in IPFS i want to distribute it equally among 4 nodes so that each node will have 100KB data.
So this problem statement will be implemented by Private IPFS Network or by IPFS Cluster?
And in either case, what will happen if 1 node goes down?

As of today, it is not possible to do this in IPFS or in IPFS Cluster.

Note that cluster works as companion to IPFS. It does not matter if IPFS is configured for a private network or part of the public one.

Ok.

So does it mean that each node will have 400KB data ?

And in either case, what will happen if 1 node goes down?

That one’s easy. If you actually manage to only give 1/4 of the file to each of 4 nodes and one of them goes down, then the file will not be fully accessible.

Yes, correct. All will have the full file.

I am still confused.
So you mean to say that If I have a 4 node IPFS Cluster and i upload a file, then the cluster in total will have 4 copies of the file, 1 in each node?
When we talk about replication in IPFS, do we mean this only?

So you mean to say that If I have a 4 node IPFS Cluster and i upload a file, then the cluster in total will have 4 copies of the file, 1 in each node?

Yes, this is what is happening in IPFS Cluster.

When we talk about replication in IPFS, do we mean this only?

It might be worth clarifying replication a little further. There is generally “proactive replication”, where a file is replicated automatically to the storage space of other nodes (e.g., upon adding a file to the network) - this is what is happening in IPFS Cluster, but not in IPFS. And then there is “reactive replication” where some file is replicated to a node’s storage, only after this node has requested the file. This is also referred to as caching and is what IPFS does.

BTW, it would be really cool to have what you initially asked. A great way of achieving that is through erasure coding. It would certainly help with large files and nodes going offline and it could work both in IPFS Cluster and in IPFS (where coded content is only in nodes that have requested the file before).

What if I don’t want to replicate a file in IPFS Cluster? Will IPFS Cluster be useful then?
And if I have a 4 node Private IPFS Network, then data wont be replicated right?..In that case if 1 node goes down, then the files uploaded through that ipfs node will still be accessible to the other nodes or not?

Data will only be accessible if there is a node online that can provide it because it has stored it before.

Note that cluster allows to set a replication factor for every pinned item, so you can replicate 2,3 or 4 times, or not do it at all.

Ok. I am getting some clarity now.

So this means that in a 4 node private ipfs network, if I upload a file using node1 and then retrieve it using node2, the file will be stored in both node1 and node2. Right?

It will only be in node 2 until it might be garbage collected. Node 2 is not guaranteed to keep it unless it is pinned there, either explicitly or via ipfs-cluster. It’s actually only guaranteed to stay in Node 1 if it is pinned there, but I think uploading (might?) automatically pin in the node to which it was upload.

Ok. So lets say we have an IPFS Cluster where i am not replicating any data…Then in general terms, will the IPFS cluster and Private IPFS network will be same?

@hector Can you please clarify on my earlier doubts also.

Actually i working on a production grade project where i have a Kubernetes cluster with multiple worker nodes and for data storage i have to use IPFS multinode setup either through IPFS CLuster or Private network but it will be running inside the kubernetes cluster.
The concern here is that i want a High Available IPFS setup which should be scalable horizontally as we will be storing some 2-3 petabyte data.
I am thinking of having a 2 node IPFS Cluster initially with replication so that if 1 node goes down, the other will be available and later when storage is getting full, i can add 2 more node with replication and so on.

Is this type of solution feasible with IPFS Cluster?
Earlier i thought of using Private IPFS network but in that case will i be able to ensure some level of high availability?

Can you clarify on the above doubt also?

IPFS nodes will store/advertise content that has been added to them or that they have retrieved. As mentioned above, they can also remove the content from themselves by running a garbage collection, when the content is not pinned, but this does not happen automatically unless configured.

Yes, you can have a 2 peer cluster with replication-factor=2 and then increase the number of peers but keep the replication-factor to 2. Cluster will pin content to the nodes with most storage available.

Private IPFS network is not really related to redundancy or availability. A private IPFS network is simply not part of the public IPFS network and nodes in the public network cannot connect to it nor retrieve any content from it.