the “cluster pinset” (the list of things pinned by a cluster and the options of each pin) is the only piece of data that needs synchronization and convergance between cluster peers. This is essentially a key-value store where keys are CIDs.
Cluster offers to options for that: when using Raft, it is used to elect a leader and that peer performs the operations on the data so there is no possibility of “conflict”
When using CRDTs, operations are broadcasted around by any peer and applied. Cluster uses something called Merkle-CRDT where the operations are set as roots of a global state Merkle-DAG. This merkle DAG is conflict-free (see Merkle-CRDTs paper draft). Conflicts can however appear when applying the operations to the key-value store (for example when two parallel branches of the dag modify the same key). The prevailing value for a key will be decided based on the DAG-height of the node that issued that operation (highest height wins) or, in the case of the same height, by alphabetic order (https://github.com/ipfs/go-ds-crdt/blob/master/set.go#L29 priority is set to be the height). This happens at write time. You can consider the height as a sort of clock value (because the Merkle-DAG is a logical clock itself).
Are the ipfs cluster operations atomic ?
Modifications to the pinset are convergent, and can happen simultaneously. I am not sure how I would define “atomic” in this context. In the case of Raft they are atomic as log entries are applied one by one. But this corresponds to modifying the global pinset only. The act of actually getting the content pinned on IPFS is async.
Another question came to my mind: in Raft mode, does the leader perform absolutely all operations in the cluster ? Or is there a (possibly) different leader for each file ?
In the first case, how does it scale to large networks with a lot of operations ?
There is no notion of separate “files”. We have a “shared state”, which is the cluster pinset.
In Raft mode there is a single leader in the Cluster which performs all operations modifying the pinset. Non-leader peers forward the operations to the leader.
how does it scale to large networks with a lot of operations ?
It does not scale well, but will depend on network latency, reliability and how long processing an operation takes. On a good network maintaining a few hundreds connections and broadcasting data to them should not be a problem for a Raft peer. Processing a CRDT operation is usually slower, but the system plays better in terms of reliability requirements and scalability options.