#byte-range #read #seek #async-api #low-level #convert #parts

range-reader

Converts low-level APIs to read ranges of bytes to Read + Seek

2 unstable releases

0.2.0 May 17, 2022
0.1.0 Nov 20, 2021

#1303 in Algorithms

Download history 161/week @ 2024-04-08 32/week @ 2024-04-15 23/week @ 2024-04-22 27/week @ 2024-04-29 28/week @ 2024-05-06 34/week @ 2024-05-13 23/week @ 2024-05-20 31/week @ 2024-05-27 23/week @ 2024-06-03 48/week @ 2024-06-17 25/week @ 2024-06-24 27/week @ 2024-07-01 36/week @ 2024-07-15 34/week @ 2024-07-22

97 downloads per month

Apache-2.0

12KB
175 lines

Ranged reader

test codecov

Convert low-level APIs to read ranges of files into structs that implement Read + Seek and AsyncRead + AsyncSeek. See parquet_s3_async.rs for an example of this API to read parts of a large parquet file from s3 asynchronously.

Rational

Blob storage https APIs offer the ability to read ranges of bytes from a single blob, i.e. functions of the form

fn read_range_blocking(path: &str, start: usize, length: usize) -> Vec<u8>;
async fn read_range(path: &str, start: usize, length: usize) -> Vec<u8>;

together with its total size,

async fn length(path: &str) -> usize;
fn length(path: &str) -> usize;

These APIs are usually IO-bounded - they wait for network.

Some file formats (e.g. Apache Parquet, Apache Avro, Apache Arrow IPC) allow reading parts of a file for filter and projection push down.

This crate offers 2 structs, RangedReader and RangedStreamer that implement Read + Seek and AsyncRead + AsyncSeek respectively, to bridge the blob storage APIs mentioned above to the traits used by most Rust APIs to read bytes.

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Dependencies

~135KB