2 releases
new 0.0.2 | Apr 22, 2025 |
---|---|
0.0.1 | Apr 19, 2025 |
#150 in Compression
102 downloads per month
59KB
1K
SLoC
remozipsy - Remote Zip Sync
This crates enables it to sync the content from a remote zip file to a local filesystem, WITHOUT fetching the whole zip.
It works by fetching the central directory
first allows you to filter it against the local filesystem (e.g. if CRC matches, skip download) and only download the actually changed data from the remote source.
The goal is to reduce the total download time for the sync for average consumers (e.g. <1000 MBit/s network) but a relativly good local storage (e.g. HDD or SSH).
Crates like zip
or rc-zip
require that the zip-file is randomly seekable: impl Read + Seek
(or async variants), this is often not ideal with remote sources. Additionally it's not possible to inject into the state-machine of the respective reader to skip reads if the file is already available. remozipsy
allows for parallel reading at different points of the remote to account for high latency of network operations.
This crate was designed with Airshipper, a game-launcher, in mind. Different Game versions are stored online and this way we can achieve updates without redownloading the whole 350 MB every time.
Usage
use remozipsy::{Config, Statemachine, reqwest::ReqwestRemoteZip, tokio::TokioLocalStorage};
#[tokio::main(flavor = "multi_thread", worker_threads = 4)]
pub async fn main() {
let remote = ReqwestRemoteZip::new("remozipsy_demo".to_string(), "https://getsamplefiles.com/download/zip/sample-1.zip").unwrap();
let local = TokioLocalStorage::new("./extract", Vec::new());
let state = Statemachine::new(remote, local, Config::default());
while let Some((progress, next_state)) = state.progress().await {
state = next_state;
println!("Progress: {progress:?}");
tokio::task::yield_now().await;
}
}
The main function of this crate is Statemachine::progress()
which advances the internal statemachine.
Its possible to abstract the remote_zip as well as the local_storage via the traits: RemoteZip
and FileSystem
;
After each iteration a progress
is returned, which can be send to the user.
Algorithm
First of all we extract the central directory
entries of the zip, it tells us which files exist within a zip.
In the meantime, the crate will fetch all files from the FileSystem
, read them and generete a crc32 hash.
The Central Directory contains the filenames as well as a crc32 hash.
With that information we evaluate which files needs to be downloaded, or deleted. In a first step we download all files and start unzipping files onces downloaded.
After all new files arived, we take care of the unnecessary files and delete those.
The crate uses alot of tokio::spawn
internally to not block itself and fetch as little data as necessary from the remote side.
While still reading alot of data from the local filesystem.
Why does this work
There are multiple things that comes into play for this to work.
- Zip technology was build with floppy disks in mind. All the necessary metadata resists together at the end of the file.
- HTTP allows the
RANGE
header to only fetch part of a ressource, luckily many webservers (and github releases) support this feature - all files with in an archive are compressed individually.
See the documentation and the examples for more information on how to use this crate.
Dependencies
~7–19MB
~242K SLoC