15 releases
Uses new Rust 2024
| new 0.3.0 | Jan 9, 2026 |
|---|---|
| 0.2.2 | Dec 31, 2025 |
| 0.1.10 | Dec 3, 2025 |
| 0.1.9 | Nov 23, 2025 |
| 0.1.4 | Sep 25, 2025 |
#403 in Data structures
541 downloads per month
Used in 6 crates
1MB
19K
SLoC
CausalTensor - A Flexible Tensor for Dynamic Data
The CausalTensor provides a flexible, multi-dimensional array (tensor) backed by a single, contiguous Vec<T>. It is
designed for efficient numerical computations, featuring a stride-based memory layout that supports broadcasting for
element-wise binary operations. It offers a comprehensive API for shape manipulation, element access, and common
reduction operations like sum and mean, making it a versatile tool for causal modeling and other data-intensive
tasks.
📚 Docs
Examples
To run the examples, use cargo run --example <example_name>.
- Applicative Causal Tensor
cargo run --example applicative_causal_tensor - Basic Causal Tensor
cargo run --example causal_tensor - Effect System Causal Tensor
cargo run --example effect_system_causal_tensor - Einstein Summation Causal Tensor
cargo run --example ein_sum_causal_tensor - Functor Causal Tensor
cargo run --example functor_causal_tensor
Usage
CausalTensor is straightforward to use. You create it from a flat vector of data and a vector defining its shape.
use deep_causality_tensor::CausalTensor;
fn main() {
// 1. Create a 2x3 tensor.
let data = vec![1, 2, 3, 4, 5, 6];
let shape = vec![2, 3];
let tensor = CausalTensor::new(data, shape).unwrap();
println!("Original Tensor: {}", tensor);
// 2. Get an element
let element = tensor.get(&[1, 2]).unwrap();
assert_eq!(*element, 6);
println!("Element at [1, 2]: {}", element);
// 3. Reshape the tensor
let reshaped = tensor.reshape(&[3, 2]).unwrap();
assert_eq!(reshaped.shape(), &[3, 2]);
println!("Reshaped to 3x2: {}", reshaped);
// 4. Perform tensor-scalar addition
let added = &tensor + 10;
assert_eq!(added.as_slice(), &[11, 12, 13, 14, 15, 16]);
println!("Tensor + 10: {}", added);
// 5. Perform tensor-tensor addition with broadcasting
let t1 = CausalTensor::new(vec![1, 2, 3, 4, 5, 6], vec![2, 3]).unwrap();
// A [1, 3] tensor...
let t2 = CausalTensor::new(vec![10, 20, 30], vec![1, 3]).unwrap();
// ...is broadcasted across the rows of the [2, 3] tensor.
let result = (&t1 + &t2).unwrap();
assert_eq!(result.as_slice(), &[11, 22, 33, 14, 25, 36]);
println!("Tensor-Tensor Add with Broadcast: {}", result);
// 6. Sum all elements in the tensor (full reduction)
let sum = tensor.sum_axes(&[]).unwrap();
assert_eq!(sum.as_slice(), &[21]);
println!("Sum of all elements: {}", sum);
}
Einstein Sum (ein_sum)
The ein_sum function provides a powerful and flexible way to perform various tensor operations, including matrix
multiplication, dot products, and more, by constructing an Abstract Syntax Tree (AST) of operations.
use deep_causality_tensor::CausalTensor;
use deep_causality_tensor::types::causal_tensor::op_tensor_ein_sum::EinSumOp;
fn main() {
// Example: Matrix Multiplication using ein_sum
let lhs_data = vec![1.0, 2.0, 3.0, 4.0];
let lhs_tensor = CausalTensor::new(lhs_data, vec![2, 2]).unwrap();
let rhs_data = vec![5.0, 6.0, 7.0, 8.0];
let rhs_tensor = CausalTensor::new(rhs_data, vec![2, 2]).unwrap();
// Construct the AST for matrix multiplication
let mat_mul_ast = EinSumOp::mat_mul(lhs_tensor, rhs_tensor);
// Execute the Einstein summation
let result = CausalTensor::ein_sum(&mat_mul_ast).unwrap();
println!("Result of Matrix Multiplication:\n{:?}", result);
// Expected: CausalTensor { data: [19.0, 22.0, 43.0, 50.0], shape: [2, 2], strides: [2, 1] }
// Example: Dot Product
let vec1_data = vec![1.0, 2.0, 3.0];
let vec1_shape = vec![3];
let vec1_tensor = CausalTensor::new(vec1_data, vec1_shape).unwrap();
let vec2_data = vec![4.0, 5.0, 6.0];
let vec2_shape = vec![3];
let vec2_tensor = CausalTensor::new(vec2_data, vec2_shape).unwrap();
// Execute the Einstein summation for dot product
let result_dot_prod = CausalTensor::ein_sum(&EinSumOp::dot_prod(vec1_tensor, vec2_tensor)).unwrap();
println!("Result of Dot Product:\n{:?}", result_dot_prod);
}
Functional Composition
Causal Tensor implements a Higher Kinded Type via the deep_causality_haft crate as Witness Type. When imported, the
CausalTensorWitness type allows monadic composition and abstract type programming. For example, one can write generic
functions that uniformly process tensors and other types:
use deep_causality_haft::{Functor, HKT, OptionWitness, ResultWitness};
use deep_causality_tensor::{CausalTensor, CausalTensorWitness};
fn triple_value<F>(m_a: F::Type<i32>) -> F::Type<i32>
where
F: Functor<F> + HKT,
{
F::fmap(m_a, |x| x * 3)
}
fn main() {
println!("--- Functor Example: Tripling values in different containers ---");
// Using triple_value with Option
let opt = Some(5);
println!("Original Option: {:?}", opt);
let proc_opt = triple_value::<OptionWitness>(opt);
println!("Doubled Option: {:?}", proc_opt);
assert_eq!(proc_opt, Some(15));
// Using triple_value with Result
let res = Ok(5);
println!("Original Result: {:?}", res);
let proc_res = triple_value::<ResultWitness<i32>>(res);
println!("Doubled Result: {:?}", proc_res);
assert_eq!(proc_res, Ok(15));
// Using triple_value with CausalTensor
let tensor = CausalTensor::new(vec![1, 2, 3], vec![3]).unwrap();
println!("Original CausalTensor: {:?}", tensor);
let proc_tensor = triple_value::<CausalTensorWitness>(tensor);
println!("Doubled CausalTensor: {:?}", proc_tensor);
assert_eq!(proc_tensor.data(), &[3, 6, 9]);
}
Functional composition of HKS tensors works best via an effect system that captures side effects and provides detailed errors and logs for each processing step. In the example below, Tensors are composed and the container MyMonadEffect3 capture the final tensor value, optional errors, and detailed logs from each processing step.
// ... Truncated
// 4. Chain Operations using Monad::bind
println!("Processing steps...");
let final_effect = MyMonadEffect3::bind(initial_effect, step1);
let final_effect = MyMonadEffect3::bind(final_effect, step2);
let final_effect = MyMonadEffect3::bind(final_effect, step3);
println!();
println!("--- Final Result ---");
println!("Final CausalTensor: {:?}", final_effect.value);
println!("Error: {:?}", final_effect.error);
println!("Logs: {:?}", final_effect.logs);
For complex data processing pipelines, these information are invaluable for debugging and optimization. Also, in case more detailed information are required i.e. processing time for each step, then an Effect Monad of arity 4 or 5 can be used to capture additional fields at each step.
GPU Acceleration (Apple Silicon)
CausalTensor supports optional GPU acceleration via MLX on Apple Silicon (M1/M2/M3). Enable with the mlx feature flag.
Prerequisites
MLX requires Xcode and the Metal Toolchain. Run the following setup steps:
# 1. Run Xcode first-launch setup (installs command-line tools)
xcodebuild -runFirstLaunch
# 2. Download the Metal Toolchain for GPU shader compilation
xcodebuild -downloadComponent MetalToolchain
# 3. Build with MLX feature enabled
RUSTFLAGS='-C target-cpu=native' cargo build --release -p deep_causality_tensor --features mlx
# 4. Run MLX tests (must use single thread due to Metal command buffer serialization)
cargo test -p deep_causality_tensor --features mlx mlx -- --test-threads=1
Note: MLX tests must run with
--test-threads=1due to Metal command buffer serialization requirements. Parallel test execution causes Metal assertion failures.
Enabling MLX
# Cargo.toml
[dependencies]
deep_causality_tensor = { version = "0.2", features = ["mlx"] }
Precision vs Bulk Compute: f32 vs f64
Apple's Metal GPU does not support f64. All GPU operations run in f32. This creates a natural separation:
| Workload Type | Precision | Use |
|---|---|---|
| Precision workloads | f64 | Accumulation over large N, small differences of large numbers, clock drift (10⁻¹⁵ scale) |
| Bulk compute | f32 | Matrix multiplication, eigendecomposition, neural network inference |
Rule of thumb: If your smallest meaningful quantity ε and largest M satisfy log₁₀(M/ε) > 7, use f64.
MlxCausalTensor
For GPU-accelerated operations, use MlxCausalTensor which stores data directly in MLX's unified memory:
use deep_causality_tensor::{CausalTensor, MlxCausalTensor};
// Scenario 1: Direct GPU construction (no conversion overhead)
let mlx_a = MlxCausalTensor::new_f32(vec![1.0, 2.0, 3.0, 4.0], vec![2, 2]) ?;
let mlx_b = MlxCausalTensor::new_f32(vec![5.0, 6.0, 7.0, 8.0], vec![2, 2]) ?;
let result = mlx_a.matmul( & mlx_b) ?;
let output = result.to_causal_tensor() ?; // Back to CausalTensor<f32>
// Scenario 2: Bridge from f64 physics simulation (with downcast)
let physics_data: CausalTensor<f64> = /* precision-critical simulation */;
let mlx_tensor = MlxCausalTensor::from_causal_tensor_f64( & physics_data) ?;
// GPU-accelerated matmul runs in f32
let accelerated = mlx_tensor.matmul( & other) ?;
Native Operations & EinSum
The MLX backend provides fully native GPU configurations for:
ein_sum: Native GPU execution via recursive AST interpretation. No CPU roundtrips.- Linear Algebra:
matmul,svd,qr,cholesky_decomposition,solve_least_squares_cholsky,inverse. - Tensor Ops:
slice,permute,reshape,broadcast, etc.
Recommended Pattern for Physics
Separate precision-critical storage from bulk compute:
// 1. Store raw data in f64 for precision
let clock_drifts: CausalTensor<f64> = load_satellite_data(); // femtosecond precision
// 2. Downcast for GPU-accelerated matrix ops
let covariance = MlxCausalTensor::from_causal_tensor_f64( & clock_drifts) ?;
let eigenvalues = covariance.eigendecomposition() ?;
// 3. Upcast results if precision needed for next stage
let eigenvalues_f64: Vec<f64> = eigenvalues.to_causal_tensor() ?
.data().iter().map( | & x| x as f64).collect();
Note: The copy overhead (Rust → MLX → Rust) means MLX is most beneficial for large tensors (N > 10,000) or complex O(N³) operations where compute time dominates data transfer time. The Native
ein_sumimplementation ensures that complex contraction chains remain entirely on the GPU.
Performance
CPU Benchmarks
The following benchmarks were run on a CausalTensor of a small 100x100 tensor (10,000 f64 elements).
| Operation | Time | Notes |
|---|---|---|
tensor_get |
~2.31 ns | Accessing a single element. |
tensor_reshape |
~2.46 µs | Metadata only, but clones data in the test. |
tensor_scalar_add |
~4.95 µs | Element-wise addition with a scalar. |
tensor_tensor_add_broadcast |
~46.67 µs | Element-wise addition with broadcasting. |
tensor_sum_full_reduction |
~10.56 µs | Summing all 10,000 elements of the tensor. |
CPU / GPU (MLX) Benchmarks
| Operation | Size | CPU Time | GPU Time | Speedup |
|---|---|---|---|---|
| MatMul | 128×128 | 1.50 ms | 0.17 ms | 8.8x |
| MatMul | 512×512 | 134.6 ms | 0.22 ms | 612x |
| MatMul | 1024×1024 | 1,087 ms | 0.41 ms | 2,651x |
Hardware:
- Architecture: ARM64 (Apple Silicon, M3 Max)
- OS: macOS 26.2 | Kernel Version 25.2.0
For detailed benchmarks and a comparision to MLX / GPU, see the BENCHMARK file.
Technical Implementation
Strides
The core of CausalTensor is its stride-based memory layout. For a given shape (e.g., [d1, d2, d3]), the strides
represent the number of elements to skip in the flat data vector to move one step along a particular dimension. For a
row-major layout, the strides would be [d2*d3, d3, 1]. This allows the tensor to calculate the flat index for any
multi-dimensional index [i, j, k] with a simple dot product: i*strides[0] + j*strides[1] + k*strides[2].
Broadcasting
Binary operations support broadcasting, which follows rules similar to those in libraries like NumPy. When operating on
two tensors, CausalTensor compares their shapes dimension by dimension (from right to left). Two dimensions are
compatible if:
- They are equal.
- One of them is 1.
The smaller tensor's data is conceptually "stretched" or repeated along the dimensions where its size is 1 to match the
larger tensor's shape, without actually copying the data. The optimized binary_op implementation achieves this by
manipulating how it calculates the flat index for each tensor inside the computation loop.
API Overview
The CausalTensor API is designed to be comprehensive and intuitive:
- Constructor:
CausalTensor::new(data: Vec<T>, shape: Vec<usize>) - Inspectors:
shape(),num_dim(),len(),is_empty(),as_slice() - Indexing:
get(),get_mut() - Shape Manipulation:
reshape(),ravel() - Reduction Operations:
sum_axes(),mean_axes(),arg_sort() - Arithmetic: Overloaded
+,-,*,/operators for both tensor-scalar and tensor-tensor operations.
👨💻👩💻 Contribution
Contributions are welcomed especially related to documentation, example code, and fixes. If unsure where to start, just open an issue and ask.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in deep_causality by you, shall be licensed under the MIT licence, without any additional terms or conditions.
📜 Licence
This project is licensed under the MIT license.
👮️ Security
For details about security, please read the security policy.
💻 Author
- Marvin Hansen.
- Github GPG key ID: 369D5A0B210D39BC
- GPG Fingerprint: 4B18 F7B2 04B9 7A72 967E 663E 369D 5A0B 210D 39BC
Dependencies
~0–1MB
~20K SLoC