I have being doing an exploration in attempt to provide an interface for reading / writing data into “user library” across all app / sites on the system. Where “user library” is user data on IPFS network.
In an attempt to provide a seamless user experience PoC attempts to use native IPFS node on the device and falls back to in-browser IPFS node implementation running in SharedWorker. But since user may have different devices and / or have different node (in browser or native one) available at different times experience would suffer as different data may end up in different nodes.
This made me realize that ipfs cluster is kind of solution to this problem - I could form a cluster with in-browser node and native node to keep the same data available across them. This also would neatly address cases where user has multiple devices by adding nodes into a cluster. Additionally server nodes could be added to improve availability / sync.
Design mismatch
However I think there is a design mismatch (unless I’m overlooking something). From user perspective there set of devices / nodes available and data in the library. However some devices might have less capacity so replicating all of the data across all the nodes does not seem to be a good fit, which is to suggest that allowing a cluster to maintain multiple pinset with different peersets might be a better fit. Does that makes sense ? Or should just different clusters formed per datasets ? Later seems at odds with user perspective as it’s still same cluster just some data may be not immediately available across all nodes and in fact having list of nodes who have that data might be useful in getting it faster when needed.
Any feedback pointers and thoughts would be appreciated.
The IPFS Cluster approach is to allocate every CID to a list of peers (members of the cluster). Then you can use whatever strategy works best to build that list (i.e. pinning everywhere except in the mobile devices, unless the pin was added from a mobile device, in which case it gets added to that peer too.).
This is somewhat equivalent to maintaining a different pinset in every peer.
I was under impression that pinning anything on any of the participating nodes will automatically get replicated across the cluster it’s part of. I think I understand now that instead you pin in the cluster and I guess you could form clusters per dataset. Or am I misunderstanding something ?
Yes, you can do that. That’s when you put all the cluster peers in the Allocations for a CID (or simply leave it empty). But every CID carries on which peers it should be pinned and that list can vary per CID (some can be pinned everywhere and others not).
I’m not sure we call “a cluster” the same thing. If you have a list of CIDs and you are pinning them to different peers you can:
Say you have a single cluster with different allocations per CID
Say you have a different pinset per cluster peer
Say you have different clusters which share a single pinset.
It’s all the same. All is a map between CIDs and the Peers they need to be pinned and if all those peers are connected and can access each others content and information, I would call that a single Cluster because a single IPFS Cluster does that.
A Cluster for me is something fully independent which consists in a set of peers which share a pinset (even if they don’t have to pin everything in it). You could also have several “clusters” where everything is pinned everywhere (one per dataset) and get the same result.