14 releases

0.1.13	Jul 20, 2024
0.1.12	Jul 12, 2024
0.1.9	Jun 23, 2024
0.1.2	May 28, 2024

#259 in Debugging

1,169 downloads per month

MIT license

245KB
4K SLoC

RLDB A Rust implementation of the Amazon Dynamo Paper

Introduction

RLDB (Rusty Learning Dynamo Database) is an educational project that provides a Rust implementation of the Amazon dynamo paper. This project aims to help developers and students understand the principles behind distributed key value data stores.

Features

Feature	Description	Status	Resources
InMemory Storage Engine	A simple in-memory storage engine	$${\textsf{\color{green}Implemented}}$$	Designing Data-Intensive applications - chapter 3
LSMTree	An LSMTree backed storage engine	$${\textsf{\color{yellow}TODO}}$$	Designing Data-Intensive applications - chapter 3
Log Structured HashTable	Similar to the bitcask storage engine	$${\textsf{\color{yellow}TODO}}$$	Bitcask intro paper
TCP server	A tokio backed TCP server for incoming requests	$${\textsf{\color{green}Implemented}}$$	tokio
PUT/GET/DEL client APIs	TCP APIs for PUT GET and DELET	$${\textsf{\color{greenyellow}WIP}}$$	N/A
PartitioningScheme via Consistent Hashing	A functional consistent-hashing implementation	$${\textsf{\color{green}Implemented}}$$	Designing Data-Intensive applications - chapter 6, Consistent Hashing by David Karger
Leaderless replication of partitions	Replicating partition data using the leaderless replication approach	$${\textsf{\color{green}Implemented}}$$	Designing Data-Intensive applications - chapter 5
Quorum	Quorum based reads and writes for tunnable consistenty guarantees	$${\textsf{\color{green}Implemented}}$$	Designing Data-Intensive applications - chapter 5
Node discovery and failure detection	A gossip based mechanism to discover cluster nodes and detect failures	$${\textsf{\color{green}Implemented}}$$	Dynamo Paper
re-sharding/rebalancing	Moving data between nodes after cluster state changes	$${\textsf{\color{yellow}TODO}}$$	Designing Data-Intensive applications - chapter 6
Data versioning	Versioning and conflict detection / resolution (via VersionVectors)	$${\textsf{\color{green}Implemented}}$$	Vector clock wiki, Lamport clock paper (not that easy to parse)
Reconciliation via Read repair	GETs can trigger repair in case of missing replicas	$${\textsf{\color{yellow}TODO}}$$	Dynamo Paper
Active anti-entropy	Use merkle trees to detect missing replicas and trigger reconciliation	$${\textsf{\color{yellow}TODO}}$$	Dynamo Paper

Running the server

Start nodes using config files in different terminals

cargo run --bin rldb-server -- --config-path conf/node_1.json

cargo run --bin rldb-server -- --config-path conf/node_2.json

cargo run --bin rldb-server -- --config-path conf/node_3.json

Include the new nodes to the cluster

In this example, we assume node on port 3001 to be the initial cluster node and we add the other nodes to it.

cargo run --bin rldb-client join-cluster -p 3002 --known-cluster-node 127.0.0.1:3001
cargo run --bin rldb-client join-cluster -p 3003 --known-cluster-node 127.0.0.1:3001

PUT

cargo run --bin rldb-client put -p 3001 -k foo -v bar

{"message":"Ok"}%

GET

cargo run --bin rldb-client get -p 3001 -k foo

{"values":[{"value":"bar","crc32c":179770161}],"context":"00000001527bd0d79bdb065196e93b951879b64300000000000000000000000000000001"}

Handling conflicts

Example of a GET request encountering conflicts

cargo run --bin rldb-client get -p 3001 -k foo2

{"values":[{"value":"bar2","crc32c":1093081014},{"value":"bar1","crc32c":1383588930},{"value":"bar3","crc32c":3008140469}],"context":"00000003296f248aff807cf05f4bcd0d05a45cc500000000000000000000000000000001527bd0d79bdb065196e93b951879b64300000000000000000000000000000001e975274170197c04e92166baff4d20c900000000000000000000000000000001"}

The context key of the response is what allows subsequent PUTs to resolve the given conflicts. For example:

cargo run --bin rldb-client put -p 3002 -k foo2 -v conflicts_resolved -c 00000003296f248aff807cf05f4bcd0d05a45cc500000000000000000000000000000001527bd0d79bdb065196e93b951879b64300000000000000000000000000000001e975274170197c04e92166baff4d20c900000000000000000000000000000001

{"message":"Ok"}% 

cargo run --bin rldb-client get -p 3001 -k foo2 
{"values":[{"value":"conflicts_resolved","crc32c":3289643150}],"context":"00000003296f248aff807cf05f4bcd0d05a45cc500000000000000000000000000000001527bd0d79bdb065196e93b951879b64300000000000000000000000000000002e975274170197c04e92166baff4d20c900000000000000000000000000000001"}%

Extracting traces with jaeger

The all-in-one jeager docker image can be used to export rldb traces locally. The steps to get there are:

Start the jaeger container

$ ./local_jaeger.sh

Start nodes with the jaeger flag

cargo run --bin rldb-server -- --config-path conf/node_1.json --tracing-jaeger

Use the jaeger UI at: http://localhost:16686

Documentation

See rldb docs

License

This project is licensed under the MIT license. See License for details

Acknowledgments

This project was inspired by the original Dynamo paper but also by many other authors and resources like:

and many others. When modules in this project are based on specific resources, they will be included as part of the module documentation

Dependencies

~19–33MB
~483K SLoC