#chunk #operations #append #concatenation #structure #sharing #structural

tailcall-chunk

A Rust implementation of a persistent data structure for efficient append and concatenation operations

9 releases

0.3.1 Nov 26, 2024
0.3.0 Nov 19, 2024
0.2.5 Nov 16, 2024
0.1.0 Nov 16, 2024

#172 in Data structures

Download history 44/week @ 2024-11-11 795/week @ 2024-11-18 1422/week @ 2024-11-25 766/week @ 2024-12-02 376/week @ 2024-12-09

3,376 downloads per month

Apache-2.0

37KB
287 lines

Chunk

Crates.io Version Documentation Build Status License

A Rust implementation of a persistent data structure that provides O(1) append and concatenation operations through structural sharing.

Features

  • O(1) Append/Prepend Operations: Add elements to your chunk in constant time
  • O(1) Concatenation: Combine two chunks efficiently
  • Immutable/Persistent: All operations create new versions while preserving the original
  • Memory Efficient: Uses structural sharing via reference counting
  • Safe Rust: Implemented using 100% safe Rust

Theoretical Background

This implementation is inspired by the concepts presented in Hinze and Paterson's work on Finger Trees[^1], though simplified for our specific use case. While our implementation differs in structure, it shares similar performance goals and theoretical foundations.

Relationship to Finger Trees

Finger Trees are a functional data structure that supports:

  • Access to both ends in amortized constant time
  • Concatenation in logarithmic time
  • Persistence through structural sharing

Our Chunk implementation achieves similar goals through a simplified approach:

  • We use Append nodes for constant-time additions
  • The Concat variant enables efficient concatenation
  • Rc (Reference Counting) provides persistence and structural sharing

Like Finger Trees, our structure can be viewed as an extension of Okasaki's implicit deques[^2], but optimized for our specific use cases. While Finger Trees offer a more general-purpose solution with additional capabilities, our implementation focuses on providing:

  • Simpler implementation
  • More straightforward mental model
  • Specialized performance characteristics for append/concat operations

Performance Trade-offs

While Finger Trees achieve logarithmic time for concatenation, our implementation optimizes for constant-time operations through lazy evaluation. This means:

  • Append, Prepend and concatenation are always O(1)
  • The cost is deferred to when we need to materialize the sequence (via as_vec())
  • Memory usage grows with the number of operations until materialization

This trade-off is particularly beneficial in scenarios where:

  • Write operations need to be performed extensively
  • Multiple transformations are chained
  • Not all elements need to be materialized
  • Structural sharing can be leveraged across operations

Installation

Add this to your Cargo.toml:

[dependencies]
tailcall-chunk = "0.1.0"

Quick Start

use chunk::Chunk;

// Create a new chunk and append some elements
let chunk1 = Chunk::default()
    .append(1)
    .append(2);

// Create another chunk
let chunk2 = Chunk::default()
    .append(3)
    .append(4);

// Concatenate chunks in O(1) time
let combined = chunk1.concat(chunk2);

// Convert to vector when needed
assert_eq!(combined.as_vec(), vec![1, 2, 3, 4]);

Detailed Usage

Working with Custom Types

use chunk::Chunk;

#[derive(Debug, PartialEq)]
struct Person {
    name: String,
    age: u32,
}

let people = Chunk::default()
    .append(Person {
        name: "Alice".to_string(),
        age: 30
    })
    .append(Person {
        name: "Bob".to_string(),
        age: 25
    });

// Access elements
let people_vec = people.as_vec();
assert_eq!(people_vec[0].name, "Alice");
assert_eq!(people_vec[1].name, "Bob");

Memory Efficiency

The Chunk type uses structural sharing through reference counting (Rc), which means:

  • Appending or concatenating chunks doesn't copy the existing elements
  • Memory is automatically freed when no references remain
  • Multiple versions of the data structure can coexist efficiently
use chunk::Chunk;

let original = Chunk::default().append(1).append(2);
let version1 = original.clone().append(3);  // Efficient cloning
let version2 = original.clone().append(4);  // Both versions share data

Performance Characteristics

Operation Time Complexity Space Complexity
new() O(1) O(1)
append() O(1) O(1)
concat() O(1) O(1)
transform() O(1) O(1)
transform_flatten() O(1) O(1)
as_vec() O(n) O(n)
clone() O(1) O(1)

Benchmark Comparison

The following table compares the actual performance of Chunk vs Vector operations based on our benchmarks (lower is better):

Operation Chunk Performance Vector Performance Faster
Append 604.02 µs 560.58 µs Vec is ~1.08 times faster
Prepend 1.63 ms 21.95 ms Chunk is ~13.47 times faster
Concat 71.17 ns 494.45 µs Chunk is ~6,947 times faster
Clone 4.16 ns 1.10 µs Chunk is ~264 times faster

Note: These benchmarks represent specific test scenarios and actual performance may vary based on usage patterns. Chunk operations are optimized for bulk operations and scenarios where structural sharing provides benefits. View the complete benchmark code and results in our operations.rs benchmark file.

Implementation Details

The Chunk<A> type is implemented as an enum with four variants:

  • Empty: Represents an empty chunk
  • Single: Represents a chunk with a single element
  • Concat: Represents the concatenation of two chunks
  • TransformFlatten: Represents a lazy transformation and flattening of elements

The data structure achieves its performance characteristics through:

  • Structural sharing using Rc
  • Lazy evaluation of concatenation and transformations
  • Immutable operations that preserve previous versions

Contributing

We welcome contributions! Here's how you can help:

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -am 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Please make sure to:

  • Update documentation
  • Add tests for new features
  • Follow the existing code style
  • Update the README.md if needed

License

This project is licensed under either of

at your option.

References

[^1]: Ralf Hinze and Ross Paterson. "Finger Trees: A Simple General-purpose Data Structure", Journal of Functional Programming 16(2):197-217, 2006. [^2]: Chris Okasaki. "Purely Functional Data Structures", Cambridge University Press, 1998.

No runtime deps