#chunk #append #operations #concatenation #structure #sharing #structural

tailcall-chunk

A Rust implementation of a persistent data structure for efficient append and concatenation operations

7 releases

new 0.2.5 Nov 16, 2024
0.2.4 Nov 16, 2024
0.1.0 Nov 16, 2024

#307 in Data structures

Download history 372/week @ 2024-11-10

388 downloads per month

Apache-2.0

31KB
227 lines

Chunk

Crates.io Version Documentation Build Status License

A Rust implementation of a persistent data structure that provides O(1) append and concatenation operations through structural sharing.

Features

  • O(1) Append Operations: Add elements to your chunk in constant time
  • O(1) Concatenation: Combine two chunks efficiently
  • Immutable/Persistent: All operations create new versions while preserving the original
  • Memory Efficient: Uses structural sharing via reference counting
  • Safe Rust: Implemented using 100% safe Rust

Theoretical Background

This implementation is inspired by the concepts presented in Hinze and Paterson's work on Finger Trees[^1], though simplified for our specific use case. While our implementation differs in structure, it shares similar performance goals and theoretical foundations.

Relationship to Finger Trees

Finger Trees are a functional data structure that supports:

  • Access to both ends in amortized constant time
  • Concatenation in logarithmic time
  • Persistence through structural sharing

Our Chunk implementation achieves similar goals through a simplified approach:

  • We use Append nodes for constant-time additions
  • The Concat variant enables efficient concatenation
  • Rc (Reference Counting) provides persistence and structural sharing

Like Finger Trees, our structure can be viewed as an extension of Okasaki's implicit deques[^2], but optimized for our specific use cases. While Finger Trees offer a more general-purpose solution with additional capabilities, our implementation focuses on providing:

  • Simpler implementation
  • More straightforward mental model
  • Specialized performance characteristics for append/concat operations

Performance Trade-offs

While Finger Trees achieve logarithmic time for concatenation, our implementation optimizes for constant-time operations through lazy evaluation. This means:

  • Append and concatenation are always O(1)
  • The cost is deferred to when we need to materialize the sequence (via as_vec())
  • Memory usage grows with the number of operations until materialization

This trade-off is particularly beneficial in scenarios where:

  • Multiple transformations are chained
  • Not all elements need to be materialized
  • Structural sharing can be leveraged across operations

Installation

Add this to your Cargo.toml:

[dependencies]
tailcall-chunk = "0.1.0"

Quick Start

use chunk::Chunk;

// Create a new chunk and append some elements
let chunk1 = Chunk::default()
    .append(1)
    .append(2);

// Create another chunk
let chunk2 = Chunk::default()
    .append(3)
    .append(4);

// Concatenate chunks in O(1) time
let combined = chunk1.concat(chunk2);

// Convert to vector when needed
assert_eq!(combined.as_vec(), vec![1, 2, 3, 4]);

Detailed Usage

Working with Custom Types

use chunk::Chunk;

#[derive(Debug, PartialEq)]
struct Person {
    name: String,
    age: u32,
}

let people = Chunk::default()
    .append(Person {
        name: "Alice".to_string(),
        age: 30
    })
    .append(Person {
        name: "Bob".to_string(),
        age: 25
    });

// Access elements
let people_vec = people.as_vec();
assert_eq!(people_vec[0].name, "Alice");
assert_eq!(people_vec[1].name, "Bob");

Memory Efficiency

The Chunk type uses structural sharing through reference counting (Rc), which means:

  • Appending or concatenating chunks doesn't copy the existing elements
  • Memory is automatically freed when no references remain
  • Multiple versions of the data structure can coexist efficiently
use chunk::Chunk;

let original = Chunk::default().append(1).append(2);
let version1 = original.clone().append(3);  // Efficient cloning
let version2 = original.clone().append(4);  // Both versions share data

Performance Characteristics

Operation Time Complexity Space Complexity
new() O(1) O(1)
append() O(1) O(1)
concat() O(1) O(1)
transform() O(1) O(1)
transform_flatten() O(1) O(1)
as_vec() O(n) O(n)
clone() O(1) O(1)

Implementation Details

The Chunk<A> type is implemented as an enum with four variants:

  • Empty: Represents an empty chunk
  • Append: Represents a single element appended to another chunk
  • Concat: Represents the concatenation of two chunks
  • TransformFlatten: Represents a lazy transformation and flattening of elements

The data structure achieves its performance characteristics through:

  • Structural sharing using Rc
  • Lazy evaluation of concatenation and transformations
  • Immutable operations that preserve previous versions

Contributing

We welcome contributions! Here's how you can help:

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -am 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Please make sure to:

  • Update documentation
  • Add tests for new features
  • Follow the existing code style
  • Update the README.md if needed

License

This project is licensed under either of

at your option.

References

[^1]: Ralf Hinze and Ross Paterson. "Finger Trees: A Simple General-purpose Data Structure", Journal of Functional Programming 16(2):197-217, 2006. [^2]: Chris Okasaki. "Purely Functional Data Structures", Cambridge University Press, 1998.

No runtime deps