#gcs #rsync #google #cloud #storage

gcs-rsync

rsync support for gcs with higher perf than gsutil rsync

15 releases

0.2.3 Jul 25, 2023
0.2.2 Jul 25, 2023
0.2.1 Sep 8, 2022
0.1.11 Dec 9, 2021
0.1.9 Oct 29, 2021

#146 in Web programming

Download history 13/week @ 2023-06-04 14/week @ 2023-06-18 13/week @ 2023-06-25 13/week @ 2023-07-02 58/week @ 2023-07-23 106/week @ 2023-07-30 106/week @ 2023-08-06 152/week @ 2023-08-13 136/week @ 2023-08-20 197/week @ 2023-08-27 124/week @ 2023-09-03 146/week @ 2023-09-10 72/week @ 2023-09-17

540 downloads per month

MIT license

85KB
2K SLoC

gcs-rsync

build codecov License:MIT docs.rs crates.io crates.io (recent) docker

Lightweight and efficient Rust gcs rsync for Google Cloud Storage.

gcs-rsync is faster than gsutil rsync according to the following benchmarks.

no hard limit to 32K objects or specific conf to compute state.

How to install as crate

Cargo.toml

[dependencies]
gcs-rsync = "0.2"

How to install as cli tool

cargo install --example gcs-rsync gcs-rsync

~/.cargo/bin/gcs-rsync

How to run with docker

Mirror local folder to gcs

docker run --rm -it -v ${GOOGLE_APPLICATION_CREDENTIALS}:/creds.json:ro -v <YourFolderToUpload>:/source:ro superbeeeeeee/gcs-rsync -r -m /source gs://<YourBucket>/<YourPrefix>

Mirror gcs to folder

docker run --rm -it -v ${GOOGLE_APPLICATION_CREDENTIALS}:/creds.json:ro -v <YourFolderToDownloadTo>:/dest superbeeeeeee/gcs-rsync -r -m gs://<YourBucket>/<YourPrefix> /dest

Benchmark

Important note about gsutil: The gsutil ls command does not list all object items by default but instead list all prefixes while adding the -r flag slowdown gsutil performance. The ls performance command is very different to the rsync implementation.

new files only (first time sync)

  • gcs-rsync: 2.2s/7MB
  • gsutil: 9.93s/47MB

winner: gcs-rsync

gcs-rsync sync bench

rm -rf ~/Documents/test4 && cargo build --release --examples && /usr/bin/time -lp -- ./target/release/examples/bucket_to_folder_sync
real         2.20
user         0.13
sys          0.21
             7606272  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                1915  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                 394  messages sent
                1255  messages received
                   0  signals received
                  54  voluntary context switches
                5814  involuntary context switches
           636241324  instructions retired
           989595729  cycles elapsed
             3895296  peak memory footprint

gsutil sync bench

rm -rf ~/Documents/gsutil_test4 && mkdir ~/Documents/gsutil_test4 && /usr/bin/time -lp --  gsutil -m -q rsync -r gs://dev-bucket/sync_test4/ ~/Documents/gsutil_test4/
Operation completed over 215 objects/50.3 KiB.
real         9.93
user         8.12
sys          2.35
            47108096  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              196391  page reclaims
                   1  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
               36089  messages sent
               87309  messages received
                   5  signals received
               38401  voluntary context switches
               51924  involuntary context switches
            12986389  instructions retired
            12032672  cycles elapsed
              593920  peak memory footprint

no change (second time sync)

  • gcs-rsync: 0.78s/8MB
  • gsutil: 2.18s/47MB

winner: gcs-rsync (due to size and mtime check before crc32c like gsutil does)

gcs-rsync sync bench

cargo build --release --examples && /usr/bin/time -lp -- ./target/release/examples/bucket_to_folder_sync
real         1.79
user         0.13
sys          0.12
             7864320  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                1980  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                 397  messages sent
                1247  messages received
                   0  signals received
                  42  voluntary context switches
                4948  involuntary context switches
           435013936  instructions retired
           704782682  cycles elapsed
             4141056  peak memory footprint

gsutil sync bench

/usr/bin/time -lp --  gsutil -m -q rsync -r gs://test-bucket/sync_test4/ ~/Documents/gsutil_test4/
real         2.18
user         1.37
sys          0.66
            46899200  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              100108  page reclaims
                1732  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                6311  messages sent
               12752  messages received
                   4  signals received
                6145  voluntary context switches
               14219  involuntary context switches
            13133297  instructions retired
            13313536  cycles elapsed
              602112  peak memory footprint

gsutil rsync config

gsutil -m -q rsync -r -d ./your-dir gs://your-bucket
/usr/bin/time -lp --  gsutil -m -q rsync -r gs://dev-bucket/sync_test4/ ~/Documents/gsutil_test4/

About authentication

All default functions related to authentication use GOOGLE_APPLICATION_CREDENTIALS env var as default conf like official Google libraries do on other languages (golang, dotnet)

Other functions (from and from_file) provide the custom integration mode.

For more info about OAuth2, see the related README in the oauth2 mod.

How to run tests

Unit tests

cargo test --lib

Integration tests + Unit tests

TEST_SERVICE_ACCOUNT=<PathToAServiceAccount> TEST_BUCKET=<BUCKET> TEST_PREFIX=<PREFIX> cargo test --no-fail-fast

Examples

Upload object

cargo run --release --example upload_object "<YourBucket>" "<YourPrefix>" "<YourFilePath>"

Download object

cargo run --release --example download_object "<YourBucket>" "<YourObjectName>" "<YourAbsoluteExistingDirectory>"

Delete object

cargo run --release --example delete_object "<YourBucket>" "<YourPrefix>/<YourFileName>"

List objects

cargo run --release --example list_objects "<YourBucket>" "<YourPrefix>"

List objects with default service account

GOOGLE_APPLICATION_CREDENTIALS=<PathToJson> cargo r --release --example list_objects_service_account "<YourBucket>" "<YourPrefix>"

List objects

list a bucket having more than 60K objects

time cargo run --release --example list_objects "<YourBucket>" "<YourPrefixHavingMoreThan60K>" | wc -l

Profiling

Humans are terrible at guessing-about-performance

export CARGO_PROFILE_RELEASE_DEBUG=true
sudo -- cargo flamegraph --example list_objects "<YourBucket>" "<YourPrefixHavingMoreThan60K>"
cargo build --release --examples && /usr/bin/time -lp -- ./target/release/examples/list_objects "<YourBucket>" "<YourPrefixHavingMoreThan60K>"

Native bin build (static shared lib)

docker rust rust:alpine3.14
apk add --no-cache musl-dev pkgconfig openssl-dev

LDFLAGS="-static -L/usr/local/musl/lib" LD_LIBRARY_PATH=/usr/local/musl/lib:$LD_LIBRARY_PATH CFLAGS="-I/usr/local/musl/include" PKG_CONFIG_PATH=/usr/local/musl/lib/pkgconfig cargo build --release --target=x86_64-unknown-linux-musl --example bucket_to_folder_sync

Dependencies

~12–26MB
~475K SLoC