8 releases (4 stable)
1.2.1 | May 1, 2024 |
---|---|
1.2.0 | Apr 11, 2024 |
1.1.0 | Dec 24, 2023 |
1.0.0 | Oct 14, 2022 |
0.1.0 | Aug 14, 2022 |
#323 in Filesystem
51 downloads per month
Used in atg
25KB
393 lines
S3Reader
A Rust
library to read from S3 object as if they were files on a local filesystem (almost). The S3Reader
adds both Read
and Seek
traits, allowing to place the cursor anywhere within the S3 object and read from any byte offset. This allows random access to bytes within S3 objects.
Usage
Add this to your Cargo.toml
:
[dependencies]
s3reader = "1.0.0"
Use BufRead
to read line by line
use std::io::{BufRead, BufReader};
use s3reader::S3Reader;
use s3reader::S3ObjectUri;
fn read_lines_manually() -> std::io::Result<()> {
let uri = S3ObjectUri::new("s3://my-bucket/path/to/huge/file").unwrap();
let s3obj = S3Reader::open(uri).unwrap();
let mut reader = BufReader::new(s3obj);
let mut line = String::new();
let len = reader.read_line(&mut line).unwrap();
println!("The first line >>{line}<< is {len} bytes long");
let mut line2 = String::new();
let len = reader.read_line(&mut line2).unwrap();
println!("The next line >>{line2}<< is {len} bytes long");
Ok(())
}
fn use_line_iterator() -> std::io::Result<()> {
let uri = S3ObjectUri::new("s3://my-bucket/path/to/huge/file").unwrap();
let s3obj = S3Reader::open(uri).unwrap();
let reader = BufReader::new(s3obj);
let mut count = 0;
for line in reader.lines() {
println!("{}", line.unwrap());
count += 1;
}
Ok(())
}
Use Seek
to jump to positions
use std::io::{Read, Seek, SeekFrom};
use s3reader::S3Reader;
use s3reader::S3ObjectUri;
fn jump_within_file() -> std::io::Result<()> {
let uri = S3ObjectUri::new("s3://my-bucket/path/to/huge/file").unwrap();
let mut reader = S3Reader::open(uri).unwrap();
let len = reader.len();
let cursor_1 = reader.seek(SeekFrom::Start(len as u64)).unwrap();
let cursor_2 = reader.seek(SeekFrom::End(0)).unwrap();
assert_eq!(cursor_1, cursor_2);
reader.seek(SeekFrom::Start(10)).unwrap();
let mut buf = [0; 100];
let bytes = reader.read(&mut buf).unwrap();
assert_eq!(buf.len(), 100);
assert_eq!(bytes, 100);
Ok(())
}
Q/A
Does this library really provide random access to S3 objects?
According to this StackOverflow answer, yes.
Are the reads sync or async?
The S3-SDK uses mostly async operations, but the Read
and Seek
traits require sync methods. Due to this, I'm using a blocking tokio runtime to wrap the async calls. This might not be the best solution, but works well for me. Any improvement suggestions are very welcome
Why is this useful?
Depends on your use-cases. If you need to access random bytes in the middle of large files/S3 object, this library is useful. For example, you can read it to stream mp4 files. It's also quite useful for some bioinformatic applications, where you might have a huge, several GB reference genome, but only need to access data of a few genes, accounting to only a few MB.
Dependencies
~32MB
~416K SLoC