3 releases
new 0.1.2 | Feb 6, 2025 |
---|---|
0.1.1 | Feb 2, 2025 |
0.1.0 | Feb 2, 2025 |
#515 in Algorithms
418 downloads per month
Used in chunkfs
55KB
1.5K
SLoC
rust-chunking
Content Based Chunking algorithms implementation:
- RabinCDC (taken from zbox)
- Leap-based CDC
- Matrix generation code can be found in ef_matrix.rs
- UltraCDC
- SuperCDC
- SeqCDC
Simple code to test an algorithm is provided in filetest.rs.
Features
- Chunkers that work using
std::iter::Iterator
trait, giving out data about the source dataset in the form of chunks. - Chunker sizes can be customized on creation. Default size values are provided.
- Other parameters from corresponding papers can also be modified on chunker creation.
Usage
To use them in custom code, the algorithms can be accessed using the corresponding modules, e.g.
fn main() {
let data = vec![1; 1024 * 1024];
let sizes = SizeParams::new(4096, 8192, 16384);
let chunker = ultra::Chunker::new(&data, sizes);
for chunk in chunker {
println!("start: {}, length: {}", chunk.pos, chunk.len);
}
let default_leap = leap_based::Chunker::new(&data, SizeParams::leap_default());
for chunk in default_leap {
println!("start: {}, length: {}", chunk.pos, chunk.len);
}
}
Dependencies
~2.5MB
~35K SLoC