#crdt #distributed #conflict-free #ot

ditto

CRDTs for common data structures like maps, sets, vecs, strings, and JSON

2 unstable releases

Uses old Rust 2015

0.2.0 Mar 15, 2018
0.1.0 Mar 8, 2018

#1599 in Data structures

MIT/Apache

200KB
4K SLoC

Ditto

Build Status

Ditto is a library for using CRDTs, or conflict-free replicated data types. CRDTs are data structures which can be replicated across multiple sites, edited concurrently, and merged together without leading to conflicts. Ditto provides a number of commonly used data types:

  • Register<T>: A replaceable value
  • Counter: An i64 value that increments
  • Set<T>: A HashSet-like collection of unique values
  • Map<K, V>: A HashMap-like collection of key-value pairs
  • List<T>: A Vec-like ordered sequence of elements
  • Text: A String-like container for mutable text
  • Json: A JSON value

Ditto's goal is to be fast, correct, and easy to use. If you have any questions, suggestions, or other feedback, feel free to open an issue or a pull request or contact the Ditto developers directly.

Example

extern crate ditto;
extern crate serde_json;
use ditto::List;

fn main() {
    // Create a List CRDT. The site that creates the CRDT
    // is automatically assigned id 1.
    let mut list1 = List::from(vec![100,200,300]);

    // Send the list's state over a network to a second site with id 2.
    let encoded_state = serde_json::to_string(&list1.state()).unwrap();
    let decoded_state = serde_json::from_str(&encoded_state).unwrap();
    let mut list2 = List::from_state(decoded_state, Some(2)).unwrap();

    // Edit the list concurrently at both the first and second site.
    // Whenever you edit a CRDT, you receive an op that can be sent
    // to other sites.
    let op1 = list1.insert(0, 400).unwrap();
    let op2 = list2.remove(0).1.unwrap();

    // Each site sends its op to the other site for execution.
    // The encoding and decoding has been left out for brevity.
    list1.execute_op(op2);
    list2.execute_op(op1);

    // Now both sites have the same value:
    assert_eq!(list1.state(), list2.state());
    assert_eq!(list1.local_value(), vec![400, 200, 300]);
}

You can find more examples in the examples and tests directories in the crate repo.

Using CRDTs

Ditto CRDTs are designed to mimic standard data type APIs as much as possible. You can insert and remove list and map elements, replace text elements, etc. Each edit generates an op that can be sent to other sites for execution. When you execute an op sent from another site, you receive a LocalOp that shows exactly how the CRDT's value has changed.

The two complications of CRDTs that users have to worry about are:

  • How to send ops/state from one site to another
  • How to assign a site id to each site.

Ditto does not include a networking layer. However, you can find info on how to send changes and assign site IDs in the docs below.

Sending ops and state

CRDTs and ops are serializable with Serde. Serialization is tested against serde_json (JSON) and rmp-serde (MsgPack) but may work with other formats as well.

Ops must be sent in the order they were generated. That is, if a site performs edit A and then edit B, it must send op A before it sends op B. State can be sent in any order.

Similarly, ops must be sent over a network that guarantees in-order delivery. TCP fits this requirement, so any protocol sitting atop TCP (HTTP, WebSockets, SMTP, XMPP, etc.) will work as a transport layer for op-based replication. State can be sent via a protocol that does not guarantee in-order delivery.

In general, when replicating a CRDT state you should send its state struct, not the CRDT struct, because the CRDT struct includes the site ID. For example, to replicate a Json CRDT you should send the serialized JsonState, which can be created by calling json_crdt.state().

Assigning Site IDs

A CRDT may be distributed across multiple sites. A site is just a fancy distributed systems term for "client". Each site that wishes to edit the CRDT must have a unique u32 identifier.

The site that creates the CRDT is automatically assigned to ID 1. You are responsible for assigning all other sites; Ditto will not do it for you.

Here are some strategies for assigning site identifiers:

  • Reuse existing site identifiers (e.g. numeric client ids)
  • Use a central server to assign site ids on a per-CRDT basis
  • Use a consensus algorithm like Raft or Paxos to determine a new site's ID

Site IDs can be assigned lazily. If a site only needs read access to a CRDT, it doesn't need a site ID. If a site without an ID edits the CRDT, the CRDT will update locally but ops will be cached and unavailable to the user. When the site receives an ID, that ID will be retroactively applied to the site's edits, and the cached ops will be returned to be sent over the network.

Do I need a centralized server to maintain consistency?

CRDTs do not require a central server to ensure eventual consistency; you can use them in peer-to-peer protocols, client-server applications, federated services, or any other environment. However you do need a way to assign unique site identifiers, as explained in the section Assigning Site IDs. A centralized server is one way to do that, but not the only way.

A server may also be useful as an op cache for unavailable clients. If you are using CRDTs in an application where sites are often offline (for instance, a mobile phone app), you can use the server to store ops and state changes until they have been received by all sites.

Duplicate ops

Ditto CRDTs are idempotent — executing an op twice has no effect. As long as ops from a site are executed in the order they were generated, the CRDT will maintain consistency.

Sending ops vs. sending state

Usually, the most compact way to send a change between sites is to send an op. An op is just a fancy distributed systems term for "change". Each time you edit a CRDT locally, you generate an op that can be sent to other sites and executed.

However, there may be times when it is faster and more compact to send the whole CRDT state (e.g. if you're sending 100 or 1000 edits at once). You should replicate exclusively via state if you cannot guarantee in-order op delivery.

Other Notes

Collection CRDTs are inherently larger than their native equivalents because each element must have a unique id. Overhead is most significant when storing a collection of very small values — a List<u8> will be many times larger than a Vec<u8>. If the collection itself is immutable, you can significantly reduce overhead by switching from a List<T> or Map<K,V> to a Register<Vec<T>> or Register<Map<K,V>>.

License

Ditto is licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in Ditto by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Dependencies

~1.5–2.7MB
~51K SLoC