2 releases

new 0.0.2 Apr 22, 2025
0.0.1 Apr 19, 2025

#150 in Compression

Download history 102/week @ 2025-04-15

102 downloads per month

Apache-2.0 OR MIT

59KB
1K SLoC

Crates.io docs.rs pipeline status coverage report license dependencies

remozipsy - Remote Zip Sync

This crates enables it to sync the content from a remote zip file to a local filesystem, WITHOUT fetching the whole zip.

It works by fetching the central directory first allows you to filter it against the local filesystem (e.g. if CRC matches, skip download) and only download the actually changed data from the remote source. The goal is to reduce the total download time for the sync for average consumers (e.g. <1000 MBit/s network) but a relativly good local storage (e.g. HDD or SSH).

Crates like zip or rc-zip require that the zip-file is randomly seekable: impl Read + Seek (or async variants), this is often not ideal with remote sources. Additionally it's not possible to inject into the state-machine of the respective reader to skip reads if the file is already available. remozipsy allows for parallel reading at different points of the remote to account for high latency of network operations.

This crate was designed with Airshipper, a game-launcher, in mind. Different Game versions are stored online and this way we can achieve updates without redownloading the whole 350 MB every time.

Usage

See Example: sync_remote_zip

use remozipsy::{Config, Statemachine, reqwest::ReqwestRemoteZip, tokio::TokioLocalStorage};

#[tokio::main(flavor = "multi_thread", worker_threads = 4)]
pub async fn main() {
    let remote = ReqwestRemoteZip::new("remozipsy_demo".to_string(), "https://getsamplefiles.com/download/zip/sample-1.zip").unwrap();
    let local = TokioLocalStorage::new("./extract", Vec::new());
    let state = Statemachine::new(remote, local, Config::default());

    while let Some((progress, next_state)) = state.progress().await {
        state = next_state;
        println!("Progress: {progress:?}");
        tokio::task::yield_now().await;
    }
}

The main function of this crate is Statemachine::progress() which advances the internal statemachine. Its possible to abstract the remote_zip as well as the local_storage via the traits: RemoteZip and FileSystem; After each iteration a progress is returned, which can be send to the user.

Algorithm

First of all we extract the central directory entries of the zip, it tells us which files exist within a zip. In the meantime, the crate will fetch all files from the FileSystem, read them and generete a crc32 hash. The Central Directory contains the filenames as well as a crc32 hash.

With that information we evaluate which files needs to be downloaded, or deleted. In a first step we download all files and start unzipping files onces downloaded.

After all new files arived, we take care of the unnecessary files and delete those.

The crate uses alot of tokio::spawn internally to not block itself and fetch as little data as necessary from the remote side. While still reading alot of data from the local filesystem.

Why does this work

There are multiple things that comes into play for this to work.

  1. Zip technology was build with floppy disks in mind. All the necessary metadata resists together at the end of the file.
  2. HTTP allows the RANGE header to only fetch part of a ressource, luckily many webservers (and github releases) support this feature
  3. all files with in an archive are compressed individually.

See the documentation and the examples for more information on how to use this crate.

Dependencies

~7–19MB
~242K SLoC