49 releases (18 breaking)

0.19.0-alpha.3 Jun 10, 2022
0.18.2 Jun 7, 2022
0.12.4 Feb 8, 2022
0.11.0 Dec 24, 2021
0.10.2 Oct 22, 2021

#953 in Network programming


Used in 2 crates

Apache-2.0/MIT

655KB
16K SLoC

Condow

Condow is a CONcurrent DOWnloader which downloads BLOBs by splitting the download into parts and downloading them concurrently.

Some services/technologies/backends can have their download speed improved, if BLOBs are downloaded concurrently by "opening multiple connections". An example for this is AWS S3.

This crate provides the core functionality only. To actually use it, use one of the implementation crates:

  • condow_rusoto: AWS S3 via the rusoto-s3 crate
  • condow_fs: Using async file access via tokio

All that is required to add more "services" is to implement the CondowClient trait.

License

condow is distributed under the terms of both the MIT license and the Apache License (Version 2.0).

See LICENSE-APACHE and LICENSE-MIT for details.

License: Apache-2.0/MIT


lib.rs:

ConDow

Overview

ConDow is a CONcurrent DOWnloader which downloads BLOBs by splitting the download into parts and downloading them concurrently.

Some services/technologies/backends can have their download speed improved, if BLOBs are downloaded concurrently by "opening multiple connections". An example for this is AWS S3.

This crate provides the core functionality only. To actually use it, use one of the implementation crates:

All that is required to add more "services" is to implement the [CondowClient] trait.

Usage

To use condow a client to access remote data is required. In the examples below InMemoryClient is used. Usually this would be some client which really accesses remote BLOBs.

The [Condow] struct itself can be used to download BLOBs. It might not be convenient to pass it around since it has 2 type parameters. Consider the traits [Downloads] (which has only an associated type) or [DownloadUntyped] (which is even object safe) to pass around instances of [Condow].

use condow_core::condow_client::InMemoryClient;
use condow_core::{Condow, config::Config};

// First we need a client...
let client = InMemoryClient::<String>::new_static(b"a remote BLOB");

// ... and a configuration for Condow
let config = Config::default();

let condow = Condow::new(client, config).unwrap();

assert_eq!(condow.get_size("a location").await.unwrap(), 13);

// Download the complete BLOB
let blob = condow.blob().at("a location").download_into_vec().await.unwrap();
assert_eq!(blob, b"a remote BLOB");

// Download part of a BLOB. Any Rust range syntax will work.
let blob = condow.blob().at("a location").range(2..=7).download_into_vec().await.unwrap();
assert_eq!(blob, b"remote");

let blob = condow.blob().at("a location").range(2..).download_into_vec().await.unwrap();
assert_eq!(blob, b"remote BLOB");

// get an `AsyncRead` implementation

use futures::AsyncReadExt;
let mut reader = condow.blob().at("a location").reader().await.unwrap();
let mut buf = Vec::new();
reader.read_to_end(&mut buf).await.unwrap();
assert_eq!(buf, b"a remote BLOB");

// get an `AsyncRead`+`AsyncSeek` implementation
use futures::AsyncSeekExt;
let mut reader = condow.blob()
    .at("a location")
    .trusted_blob_size(13)
    .random_access_reader()
    .finish().await.unwrap();
let mut buf = Vec::new();
reader.seek(std::io::SeekFrom::Start(2)).await.unwrap();
reader.read_to_end(&mut buf).await.unwrap();
assert_eq!(buf, b"remote BLOB");

Retries

ConDow supports retries. These can be done on the downloads themselves as well on the byte streams returned from a client. If an error occurs while streaming bytes ConDow will try to reconnect with retries and resume streaming where the previous stream failed.

Retries can also be attempted on size requests.

Be aware that some clients might also do retries themselves based on their underlying implementation. In this case you should disable retries for either the client or ConDow itself.

Behaviour

Downloads with a maximum concurrency of 3 are streamed on the same task the download was initiated on. This means that the returned stream needs to be polled to drive pulling chunks from the network. Executing the streaming also means that panics of underlying libraries will pop up on the polling task.

Downloads with a concurrency greater or equal than 4 are executed on dedicated tasks. Panics will be detected and the stream will abort with an error.

With the EnsureActivePull config setting all downloads will be executed on dedicated tasks and panics will be detected.

All downloads executed on dedicated tasks will pull bytes from the network eagerly and fill a queue.

Instrumentation

Instrumentation can be done for each individual download or centralized for global monitoring. For further information see the [probe] module.

Dependencies

~3.5–10MB
~91K SLoC