2 unstable releases
0.3.0 | Oct 11, 2024 |
---|---|
0.0.0 | Oct 4, 2024 |
#880 in Algorithms
105KB
2K
SLoC
dach
dach
is a difference algorithm. It works on general data, both binary and text.
dach
is heavily inspired by rsync
and it's rolling hash concept.
See the documentation for details on how to use dach
.
lib.rs
:
A difference library similar to rsync
.
Performance
dach
is performant enough for large files and files, exceeding 10MB (when using rolling hashes).
Please compile with the release preset for 10X the performance.
Allocating data and keeping it in memory is very fast compared to hashing. In the future, Dach will support reading data bit by bit, greatly reducing the memory usage.
How-to & examples
Keep in mind this isn't guaranteed to give the exact same data. Please check the data with a secure hashing algorithm (e.g. SHA-3) to ensure consistency.
Sending the data is possible due to serde
providing serialization and deserialization.
This requires the cargo feature serde
to be enabled.
You serialize all the structs in this library to any format.
These examples should cover what rsync does.
Get a remote's data
To get someone else's data, we construct a Signature
and send it.
The remote calculates a Difference
using Signature::diff
.
The remote sends back the Difference
which we Difference::apply
.
Push my data to remote
Request the remote's Signature
.
They calculate it and respond.
We calculate a Signature::diff
and send it to them.
They Difference::apply
it. Their data should now be equal to mine.
Get the difference of a local file
Gets a small diff to send to others, almost like how git
works.
base_data
is considered prior knowledge. target_data
is the modified data.
The data segments can be any size. Performance should still be good.
let base_data = b"This is a document everyone has. It's about some new difference library.";
let target_data = b"This is a document only I have. It's about some new difference library.";
let mut signature = Signature::new(128);
signature.write(base_data);
let signature = signature.finish();
let diff = signature.diff(target_data);
// This is the small diff you could serialize with Serde and send.
let minified = diff.minify(8, base_data)
.expect("This won't panic, as the data hasn't changed from calling the other functions.");
Future improvements
- Rolling hash
-
Multi-threadedThere is no feasible way to implement this, as we look ahead and change which window we're looking at after each iteration. Now with rolling hash, the performance is great.Signature::diff
. - Support read/write
- Support to diff a reader
- Support to apply to a writer
- Fetch API for apply to get data on demand.
- This could slow things down dramatically.
- Implement Write for
HashBuilder
.
-
Use SHA(1|256?) to verify integrity of data. Bundled with theThe implementer should provide this.Signature
.
Dependencies
~350–570KB