5 releases
0.2.1 | Aug 22, 2024 |
---|---|
0.2.0 | Jul 13, 2024 |
0.1.2 | May 27, 2024 |
0.1.1 | Apr 29, 2024 |
0.1.0 | Apr 24, 2024 |
#87 in Embedded development
370KB
7.5K
SLoC
µWheel
µWheel is an Embeddable Aggregate Management System for Streams and Queries.
See more about its design here and try it out directly on the web.
Features
- Streaming window aggregation
- Built-in warehousing capabilities
- Wheel-based query optimizer + vectorized execution.
- Out-of-order support using
low watermarking
. - High-throughput stream ingestion.
- User-defined aggregation.
- Low space footprint.
- Incremental checkpointing support.
- Compatible with
#[no_std]
(requiresalloc
).
When should I use µWheel?
µWheel unifies the aggregate management for online streaming and offline analytical queries in a single system. µWheel is not a general purpose solution but a specialized system tailored for a pre-defined aggregation function.
µWheel is an execellent choice when:
- You know the aggregation function apriori.
- You need high-throughput ingestion of out-of-order streams.
- You need support for streaming window queries (e.g., Sliding/Tumbling).
- You need support for exploratory analysis of historical data.
- You need a lightweight and highly embeddable solution.
Example use cases:
- A mini stream processor (see example)
- A real-time OLAP index (e.g., Top-N) (see example)
- A compact and mergeable system for analytics at the edge (see example).
Pre-defined Aggregators
Function | Description | Types | SIMD |
---|---|---|---|
SUM | Sum of all inputs | u16, u32, u64, i16, i32, i64, f32, f64 | ✓ |
MIN | Minimum value of all inputs | u16, u32, u64, i32, i16, i64, f32, f64 | ✓ |
MAX | Maximum value of all inputs | u16, u32, u64, i16, i32, i64, f32, f64 | ✓ |
MINMAX | Minimum and Maximum value of all inputs | u16, u32, u64, i16, i32, i64, f32, f64 | ✗ |
AVG | Arithmetic mean of all inputs | u16, u32, u64, i16, i32, i64, f32, f64 | ✗ |
ALL | Pre-computed SUM, AVG, MIN, MAX, COUNT | f64 | ✗ |
TOP N | Top N of all inputs | Aggregator with aggregate data that implements Ord |
✗ |
See a user-defined aggregator example here.
Feature Flags
std
(enabled by default)- Enables features that rely on the standard library
sum
(enabled by default)- Enables sum aggregation
avg
(enabled by default)- Enables avg aggregation
min
(enabled by default)- Enables min aggregation
max
(enabled by default)- Enables max aggregation
min_max
(enabled by default)- Enables min-max aggregation
all
(enabled by default)- Enables all aggregation
top_n
- Enables Top-N aggregation
simd
(requiresnightly
)- Enables support to speed up aggregation functions with SIMD operations
sync
(implicitly enablesstd
)- Enables a sync version of
ReaderWheel
that can be shared and queried across threads
- Enables a sync version of
profiler
(implicitly enablesstd
)- Enables recording of latencies for various operations
serde
- Enables serde support
timer
- Enables scheduling user-defined functions
Usage
For std
support and compilation of built-in aggregators:
uwheel = "0.2.1"
For no_std
support and minimal compile time:
uwheel = { version = "0.2.1", default-features = false }
Examples
The following code is from the hello world example.
use uwheel::{aggregator::sum::U32SumAggregator, WheelRange, NumericalDuration, Entry, RwWheel};
// Initial start watermark 2023-11-09 00:00:00 (represented as milliseconds)
let mut watermark = 1699488000000;
// Create a Reader-Writer Wheel with U32 Sum Aggregation using the default configuration
let mut wheel: RwWheel<U32SumAggregator> = RwWheel::new(watermark);
// Install a Sliding Window Aggregation Query (results are produced when we advance the wheel).
wheel.window(Window::sliding(30.minutes(), 10.minutes()));
// Simulate ingestion and fill the wheel with 1 hour of aggregates (3600 seconds).
for _ in 0..3600 {
// Insert entry with data 1 to the wheel
wheel.insert(Entry::new(1u32, watermark));
// bump the watermark by 1 second and also advanced the wheel
watermark += 1000;
// Print the result if any window is triggered
for window in wheel.advance_to(watermark) {
println!("Window fired {:#?}", window);
}
}
// Explore historical data - The low watermark is now 2023-11-09 01:00:00
// query the wheel using different intervals
assert_eq!(wheel.read().interval(15.seconds()), Some(15));
assert_eq!(wheel.read().interval(1.minutes()), Some(60));
// combine range of 2023-11-09 00:00:00 and 2023-11-09 01:00:00
let range = WheelRange::new_unchecked(1699488000000, 1699491600000);
assert_eq!(wheel.read().combine_range(range), Some(3600));
// The following runs the the same combine range query as above.
assert_eq!(wheel.read().interval(1.hours()), Some(3600));
See more examples here.
Acknowledgements
- µWheel borrows scripts from the egui crate.
- µWheel uses a modified Duration from the time crate.
- µWheel soft forks a Hierarchical Timing Wheel made by @Bathtor.
Contributing
See Contributing.
Community
If you find µWheel interesting and want to learn more, then join the Discord community!
Publications
- Max Meldrum, Paris Carbone (2024). µWheel: Aggregate Management for Streams and Queries (Best Paper Award). In DEBS '24. [PDF].
Blog Posts
- Introducing datafusion-uwheel, A Native DataFusion Optimizer for Time-based Analytics - August 2024
- Best Paper Award + 0.2.0 Release - July 2024
- Speeding up Temporal Aggregation in DataFusion by 60-60000x using µWheel - May 2024
Citing µWheel
@inproceedings{meldrum2024uwheel,
author = {Meldrum, Max and Carbone, Paris},
title = {μWheel: Aggregate Management for Streams and Queries},
booktitle = {Proceedings of the 18th ACM International Conference on Distributed and Event-Based Systems},
year = {2024},
pages = {54--65},
doi = {10.1145/3629104.3666031}
}
License
Licensed under either of
- Apache License, Version 2.0, (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Dependencies
~0.7–9MB
~74K SLoC