[theory] Merkle-CRDTs for embedding non-commutative operation-based CRDTs

Hi everyone,

I have read https://arxiv.org/pdf/2004.00107.pdf which I found was extremely well written and inspiring.

In my current understanding, a Merkle-CRDT is a great tool to “embed” the operations of an arbitrary operation-based CRDT that uses commutative operations.

Now, in the prospect of developing CRUD-like dApps, not all operations are commutative. For instance “delete” operations are a canonical instance of non commutative operation.

Have the IPFS community and researchers proposed any solution to run those non commutative operations?

Would defining a total order on the embedded operation-based CRDT operations would be a suitable solution in your opinion?

Is there an opened research discussion somewhere to talk about those ideas?

This is all very exciting, thank you very much!

Delete operations in sets are commutative (see OR-Set CRDTs). The deleted items just form another growing set.

There is usually a way to make everything a CRDT (using composition etc.), but sometimes a CRDT may not solve the whole problem or may not be good enough.

I was thinking of a more complex data structure than a set.

For instance lets take a twitter like platform.

Let say that both Users A and B have merged their Merkle-CRDT in the past containing the operation “User A tweeted tweet #23239”.

Then concurrently, User A deletes tweet #23239 and User B retweets #23239

We end up with a merged Merkle-CRDT with two roots, but the two corresponding operations are non-commutative (because you cannot retweet a deleted tweet) so either you choose an order to run them either you end up in an invalid dApp state?

The two DAGs can be merged and compute a single state where the retweet simply has no effect (even though it happened) because an earlier deletion of the tweet has appeared.

I disagree with you (when you say “earlier”) because there is no causality is this example because the two actions happened concurrently, the Merkle-DAG can always be merged and it has two roots after mergre, which represents this lack of causality (i.e. the fact that the action happened concurrently).

However, when the dApp will run the actions described on the two roots, if there is no extra order that is imposed on the actions (in the paper 2004.00107 operations are commutative so the order of execution does not matter), then a node can totally end up in the invalid state where a deleted tweet is being re-tweeted.

In that particular example, a lot of workarrounds could be found. But I wondered if people have formalised this problem, for instance by imposing an extra order on the embeded CRDT actions.

EDIT: are you saying, that when such a case is spotted, a node will add a new root, parent of those two roots, discarding the effect of the retweet? In that case, it seems to me that you implicitely declared that the “delete” operation had higher priority than the retweet operation, hence creating an order on operations.

There are 2 roots but the computed state is not invalid.

“deletion and then retweet” or “retweet and then deletion”, the results computes to the same “no tweet, no retweet in the computed state” regardless of the order in which things were seen. So yes, there is a “priority” to operations that happen at the “same time”. How you compute the resulting state and set the priorities does not matter as long as everyone does it the same way.

Note that the CRDT is always additive (the deletion is an addition to it), but the computed state is derived from there and can actually delete items as it evolves.

Ok!

So for everyone to compute the same state a pre-established order on operations must have been agreed on.

In this example, the two DAG to merge were very similar so dealing with the problem was straightforward. However if they are widely distinct and let say a lot of operations of DAG A have higher priority than operations of DAG B, but also a lot of operations of DAG B have higher priority than some operations in DAG A, the reconciliation algorithm is not straightforward to me and I need to think about it. That is because the dApp is going to execute some operations and some times will need to backtrack when it sees higher priority operations later on.

Said differently, it is not straightforward to me how the order on operations generalises to the entire DAG they are part of.

Note that the CRDT is always additive (the deletion is an addition to it), but the computed state is derived from there and can actually delete items as it evolves.

I 100% agree