14 releases
0.1.13 | Jul 20, 2024 |
---|---|
0.1.12 | Jul 12, 2024 |
0.1.9 | Jun 23, 2024 |
0.1.2 | May 28, 2024 |
#511 in Database interfaces
1,169 downloads per month
245KB
4K
SLoC
RLDB A Rust implementation of the Amazon Dynamo Paper
Introduction
RLDB (Rusty Learning Dynamo Database) is an educational project that provides a Rust implementation of the Amazon dynamo paper. This project aims to help developers and students understand the principles behind distributed key value data stores.
Features
Feature | Description | Status | Resources |
---|---|---|---|
InMemory Storage Engine | A simple in-memory storage engine | $${\textsf{\color{green}Implemented}}$$ | Designing Data-Intensive applications - chapter 3 |
LSMTree | An LSMTree backed storage engine | $${\textsf{\color{yellow}TODO}}$$ | Designing Data-Intensive applications - chapter 3 |
Log Structured HashTable | Similar to the bitcask storage engine | $${\textsf{\color{yellow}TODO}}$$ | Bitcask intro paper |
TCP server | A tokio backed TCP server for incoming requests | $${\textsf{\color{green}Implemented}}$$ | tokio |
PUT/GET/DEL client APIs | TCP APIs for PUT GET and DELET | $${\textsf{\color{greenyellow}WIP}}$$ | N/A |
PartitioningScheme via Consistent Hashing | A functional consistent-hashing implementation | $${\textsf{\color{green}Implemented}}$$ | Designing Data-Intensive applications - chapter 6, Consistent Hashing by David Karger |
Leaderless replication of partitions | Replicating partition data using the leaderless replication approach | $${\textsf{\color{green}Implemented}}$$ | Designing Data-Intensive applications - chapter 5 |
Quorum | Quorum based reads and writes for tunnable consistenty guarantees | $${\textsf{\color{green}Implemented}}$$ | Designing Data-Intensive applications - chapter 5 |
Node discovery and failure detection | A gossip based mechanism to discover cluster nodes and detect failures | $${\textsf{\color{green}Implemented}}$$ | Dynamo Paper |
re-sharding/rebalancing | Moving data between nodes after cluster state changes | $${\textsf{\color{yellow}TODO}}$$ | Designing Data-Intensive applications - chapter 6 |
Data versioning | Versioning and conflict detection / resolution (via VersionVectors) | $${\textsf{\color{green}Implemented}}$$ | Vector clock wiki, Lamport clock paper (not that easy to parse) |
Reconciliation via Read repair | GETs can trigger repair in case of missing replicas | $${\textsf{\color{yellow}TODO}}$$ | Dynamo Paper |
Active anti-entropy | Use merkle trees to detect missing replicas and trigger reconciliation | $${\textsf{\color{yellow}TODO}}$$ | Dynamo Paper |
Running the server
- Start nodes using config files in different terminals
cargo run --bin rldb-server -- --config-path conf/node_1.json
cargo run --bin rldb-server -- --config-path conf/node_2.json
cargo run --bin rldb-server -- --config-path conf/node_3.json
- Include the new nodes to the cluster
In this example, we assume node on port 3001 to be the initial cluster node and we add the other nodes to it.
cargo run --bin rldb-client join-cluster -p 3002 --known-cluster-node 127.0.0.1:3001
cargo run --bin rldb-client join-cluster -p 3003 --known-cluster-node 127.0.0.1:3001
PUT
cargo run --bin rldb-client put -p 3001 -k foo -v bar
{"message":"Ok"}%
GET
cargo run --bin rldb-client get -p 3001 -k foo
{"values":[{"value":"bar","crc32c":179770161}],"context":"00000001527bd0d79bdb065196e93b951879b64300000000000000000000000000000001"}
Handling conflicts
Example of a GET request encountering conflicts
cargo run --bin rldb-client get -p 3001 -k foo2
{"values":[{"value":"bar2","crc32c":1093081014},{"value":"bar1","crc32c":1383588930},{"value":"bar3","crc32c":3008140469}],"context":"00000003296f248aff807cf05f4bcd0d05a45cc500000000000000000000000000000001527bd0d79bdb065196e93b951879b64300000000000000000000000000000001e975274170197c04e92166baff4d20c900000000000000000000000000000001"}
The context
key of the response is what allows subsequent PUTs to resolve the given conflicts. For example:
cargo run --bin rldb-client put -p 3002 -k foo2 -v conflicts_resolved -c 00000003296f248aff807cf05f4bcd0d05a45cc500000000000000000000000000000001527bd0d79bdb065196e93b951879b64300000000000000000000000000000001e975274170197c04e92166baff4d20c900000000000000000000000000000001
{"message":"Ok"}%
cargo run --bin rldb-client get -p 3001 -k foo2
{"values":[{"value":"conflicts_resolved","crc32c":3289643150}],"context":"00000003296f248aff807cf05f4bcd0d05a45cc500000000000000000000000000000001527bd0d79bdb065196e93b951879b64300000000000000000000000000000002e975274170197c04e92166baff4d20c900000000000000000000000000000001"}%
Extracting traces with jaeger
The all-in-one jeager docker image can be used to export rldb traces locally. The steps to get there are:
- Start the jaeger container
$ ./local_jaeger.sh
- Start nodes with the jaeger flag
cargo run --bin rldb-server -- --config-path conf/node_1.json --tracing-jaeger
- Use the jaeger UI at: http://localhost:16686
Documentation
See rldb docs
License
This project is licensed under the MIT license. See License for details
Acknowledgments
This project was inspired by the original Dynamo paper but also by many other authors and resources like:
and many others. When modules in this project are based on specific resources, they will be included as part of the module documentation
Dependencies
~20–37MB
~495K SLoC