26 breaking releases

0.45.0 Jun 23, 2024
0.43.0 Aug 28, 2023
0.41.0 Apr 8, 2023
0.35.0 Mar 6, 2023
0.0.0 Mar 28, 2021

#121 in Algorithms

Download history 92/week @ 2024-07-27 10/week @ 2024-09-14 40/week @ 2024-09-21 18/week @ 2024-09-28

1,495 downloads per month

Apache-2.0

55KB
1.5K SLoC

mergable

Easy, performant, pragmatic CRDTs.

WIP: Do not use.

Goals

In rough order of priority.

  • Easy to use.
  • Full-mesh P2P support (true CRDT).
  • Support arbitrary, strongly-typed data structures built from common components.
  • Support for cleaning old data (as long as everyone has seen it of course).
  • Undo support (generating "revert" operations).
  • Atomic updates. (Not transactional with reads, but well-ordered.)
  • Relatively efficient.

Design

The key aspect of this is the Mergable trait which has the following interface. The rest of the crate is data-types that implement the interface, and in the future tools to help you manage the structures and syncing.

Basic Syncing

Basic syncing is very simple. To send your changes to someone else just send them your structure. They can then call local.merge(remote) and now they have your changes. They can send you their changes and you merge them in the same way and now you have the same structure. You can even do this in parallel!

let mut alice_state = unimplemented!();
let mut bob_state = unimplemented!();

// Send states to the other party.
let for_bob = serialize(&alice_state);
let for_alice = serialize(&bob_state);

// Update local states with remote.
alice_state.merge(deserialize(for_alice));
bob_state.merge(deserialize(for_bob));

assert_eq!(alice_state, bob_state);

1:1 Delta Syncing

For 1:1 syncing you maintain two copies of the data structure. One represents the remote state and one represents the local state. Edits are performed on the local state as desired. Occasionally a delta is generated and added to a sync queue.

let mut remote_state = unimplemented!();
let mut local_state = remote_state.clone(); // Or whatever you had lying around.
let mut sync_queue = Vec::new();

for _ in 0..10 {
	make_changes(&mut local_state);

	let delta = local_state.diff(&remote_state);
	sync_queue.push(serialize(&delta));
	remote_state.apply(delta.clone());
}

Once you have established a network connection the remote can apply your changes.

let mut remote_state = unimplemented!();

for delta in fetch_sync_queue() {
	remote_state.apply(delta);
}

At this point both nodes have an identical structure. (Assuming that there have been no changes to the remote concurrently.)

1:N Delta Sync

This is a common pattern of having a central server as the "source of truth". This can be used to allow offline editing (like Google Docs) or with a P2P backup for max reliability. This pattern can provide the maximum efficiency for clients at the cost of some latency (for the edits to traverse the server).

For clients this looks exactly like 1:1 sync, a simple server looks roughly as follows.

struct Server<T: Mergable> {
	state: T,
	deltas: T::Diff,
}

impl<T: Mergable> Server<T> {
	/// Get the current stae and version.
	fn get(&self) -> (Mergable, usize) {
		(self.state.clone(), self.deltas.len())
	}

	fn update(&mut self, diff: T::Diff) {
		
	}
}

let mut state = unimplemented!();
let mut deltas = Vec::new();
let mut clients = Vec::new();
for event in unimplemented!("recieve events from clients") {
	match event {
		NewClient{client} => {
			client.send(New{
				state: serialize(&state),
				version: delta.len(),
			});
			clients.push(client);
		}
		ResumeClient{client, version} => {
			for (version, delta) in deltas.iter().enumerate().skip(version) {
				client.send(Delta{
					delta: &delta,
					version: data.len(),
				})
			}
			clients.push(client);
		}
		Delta{client_id, delta} => {
			state.apply(deserialize(&delta));
			deltas.push(delta);

			for client in &mut clients {
				client.send(Delta{
					delta: &delta,
					version: data.len(),
				})
			}
		}
	}
}

This is a simple implementation, some things could be approved.

  • Regenerate the diffs from the clients to make sure they are minimal. Clients that have been disconnected for a long time may push more data than required if they don't regenerate their diffs after receiving the latest data. For mostly-connected clients in a realtime scenario this likely isn't a major concern.
  • For reconnecting clients generate an optimized diff based on the latest version they have instead of just replaying them all.
  • Clean up old deltas once all clients have moved past them (or just unconditioanlly expire old deltas and deliver a New state to old clients.)

In the future I would like to provide a server core in this library to make a quality implementation easy.

N:M Delta Sync

This is currently tricky. The best option now is likely have all clients act as a server but care must be taken to avoid holding too much state in the clients.

Another option would be to keep a copy of the last-seen-state of all peers (for a period of time) and upon reconnection generate a delta based on that. This would be feasible if your data is not too large.

In the future we may add support for common-ancestor discovery which would allow for more efficient initial syncing. (This would be similar to how Git does pushes and pulls.)

Dependencies

~0.5–1MB
~22K SLoC