#content-addressable-storage #content-addressable-cas #blob #content-addressable #blob-storage

cassadilia

A content-addressable storage (CAS) system optimized for large blobs with read-mostly access patterns

3 releases

Uses new Rust 2024

0.4.3 Oct 26, 2025
0.4.2 Sep 29, 2025
0.3.0 Sep 14, 2025
0.2.1 Aug 13, 2025
0.0.1 Jun 23, 2025

#563 in Database interfaces

Download history 129/week @ 2025-07-28 4/week @ 2025-08-04 263/week @ 2025-08-11 80/week @ 2025-08-18 31/week @ 2025-08-25 50/week @ 2025-09-01 131/week @ 2025-09-08 297/week @ 2025-09-15 212/week @ 2025-09-22 300/week @ 2025-09-29 69/week @ 2025-10-06 82/week @ 2025-10-13 151/week @ 2025-10-20 102/week @ 2025-10-27

421 downloads per month
Used in 6 crates (via tycho-core)

MIT/Apache

195KB
4.5K SLoC

When to use this?

  • You want to store blobs. Blobs are huge files, like 10mb+.
  • You will have read-mostly access pattern.
  • You want get_range(from, to) functionality.
  • You want 0 write amplification. You will have exactly one 1 write per blob without compactions and all of this stuff.
  • You tried to use rocksdb for this, and it killed your disk and brain :)

When not to use this?

  • Lots of small files. Lsm tree is still a king here.
  • You use something not unix-like.

Architecture principles

  • Firstly store data in cas, then index it. With this approach you don't need to handle consistency errors in the user code. You can get list of orphaned blobs on startup and do something with them. As user, you can do nothing if you have a record in the index for blob, but no actual blob in cas. Hi mr. rocksdb :)

  • Blobs are stored in a classic cas manner /h/a/s/h.

  • Hash is blake3

  • Index is stored in a single file. Which is periodically rewritten on wal roll-over.

  • Internally index is a BtreeMap.

Concurrency and Locking

Cassadilia relies on deterministic blob paths and scoped intents to stay consistent.

1. Atomic CAS commits

  • Parent directories are created opportunistically; repeated calls are safe.
  • Staged files are atomically renamed into place. If the CAS file already exists, the staging file is dropped without touching the existing blob.
  • Concurrent commits of the same hash converge on a single CAS file. Later writers drop their staging files without touching the existing blob.

2. Intent Tracking System

We use scoped intents to prevent races between concurrent puts and deletes.

  • Register intent: record key -> blob_hash in pending_intents map; no refcount changes.
  • Commit: append WAL Put, apply it to the index updating refcounts, remove the intent, compute unreferenced blobs, exclude hashes still referenced by active intents, delete the rest.
  • Drop without commit: remove the current intent and restore any previously replaced intent; no refcount changes.
  • Remove ops: append WAL Remove, apply to state, filter unreferenced blobs against active intents, delete before doing checkpoint.

Intents are in-memory only; if a process exits before commit, any staged files become orphans, handled by startup orphan scanning.

Todo

  • Gather orphaned blobs on startup.
  • Allow to check blobs integrity.
  • Cache fd-s
  • Use mmap or allow to configure read mode
  • Lock index file to disallow concurrent db open
  • Save settings in the separate file

Dependencies

~6–12MB
~253K SLoC