Lib.rs

›

#lsm-tree #key-value-store #apache-arrow #embedded-database #storage #parquet #persistent

tonbo

An embedded persistent KV database in Rust

3 unstable releases

0.1.0	Aug 14, 2024
0.0.1	Aug 12, 2024
0.0.0	Aug 1, 2024

#1613 in Database interfaces

291 downloads per month

Apache-2.0

270KB
7.5K SLoC

Tonbo (WIP)

Introduction

Tonbo is an embedded KV database built on Apache Arrow & Parquet, designed to store, filter, and project structured data using LSM Tree.

It can be very naturally combined with Arrow data processing components such as Datafusion. refer to this example.

Our goal is to provide a lean, modern solution for storing data in a tiered storage, which is arranged by RAM, flash, SSD, S3 and any others.

Features

Fully asynchronous API.
Zero-copy rusty API ensuring safety with compile-time type and lifetime checks.
Vendor-agnostic:
- Various usage methods, async runtimes, and file systems:
  - Rust library:
  - Python library (via PyO3 & pydantic):
    - asyncio (via pyo3-asyncio).
  - JavaScript library:
    - WASM and OPFS.
  - Dynamic library with a C interface.
- Most lightweight implementation to Arrow / Parquet LSM Trees:
  - Define schema using just Arrow schema and store data in Parquet files.
  - (Optimistic) Transactions.
  - Leveled compaction strategy.
  - Push down filter, limit and projection.
Runtime schema definition (in next release).
SQL (via Apache DataFusion).
Fusion storage across RAM, flash, SSD, and remote Object Storage Service (OSS) for each column-family, balancing performance and cost efficiency per data block:
- Remote storage (via Arrow object_store or Apache OpenDAL).
- Distributed query and compaction.
Blob storage (like BlobDB in RocksDB).

Example

use std::ops::Bound;

use futures_util::stream::StreamExt;
use tonbo::{executor::tokio::TokioExecutor, tonbo_record, Projection, DB};

// use macro to define schema of column family just like ORM
// it provides type safety read & write API
#[tonbo_record]
pub struct User {
    #[primary_key]
    name: String,
    email: Option<String>,
    age: u8,
}

#[tokio::main]
async fn main() {
    // pluggable async runtime and I/O
    let db = DB::new("./db_path/users".into(), TokioExecutor::default())
        .await
        .unwrap();

    // insert with owned value
    db.insert(User {
        name: "Alice".into(),
        email: Some("alice@gmail.com".into()),
        age: 22,
    })
    .await
    .unwrap();

    {
        // tonbo supports transaction
        let txn = db.transaction().await;

        // get from primary key
        let name = "Alice".into();

        // get the zero-copy reference of record without any allocations.
        let user = txn
            .get(
                &name,
                // tonbo supports pushing down projection
                Projection::All,
            )
            .await
            .unwrap();
        assert!(user.is_some());
        assert_eq!(user.unwrap().get().age, Some(22));

        {
            let upper = "Blob".into();
            // range scan of
            let mut scan = txn
                .scan((Bound::Included(&name), Bound::Excluded(&upper)))
                .await
                // tonbo supports pushing down projection
                .projection(vec![1])
                .take()
                .await
                .unwrap();
            while let Some(entry) = scan.next().await.transpose().unwrap() {
                assert_eq!(
                    entry.value(),
                    Some(UserRef {
                        name: "Alice",
                        email: Some("alice@gmail.com"),
                        age: Some(22),
                    })
                );
            }
        }

        // commit transaction
        txn.commit().await.unwrap();
    }
}

Contributing to Tonbo

Please feel free to ask any question or contact us on Github Discussions.

Dependencies

~31–48MB
~1M SLoC