45 releases (16 breaking)
0.17.0 | Aug 4, 2024 |
---|---|
0.17.0-beta.2 | Jul 23, 2024 |
0.17.0-beta.1 | Jun 4, 2024 |
0.16.7 | Mar 26, 2024 |
0.5.0 | Nov 8, 2022 |
#24 in Compression
781 downloads per month
Used in 9 crates
73KB
806 lines
OneIO - all-in-one convenient IO library for Rust
OneIO is a Rust library that provides a unified simple IO interface for reading and writing to and from data files from different sources and compressions.
Usage and Feature Flags
Enable all compression algorithms and handle remote files (default)
oneio = "0.17"
Select from supported feature flags
oneio = { version = "0.17", default-features = false, features = ["remote", "gz"] }
Default flags include lib-core
and rustls
.
Core features: lib-core
lib-core
core features include:
remote
: allow reading from remote files, including http(s) and ftpcompressions
: support all compression algorithmsgz
: supportgzip
files usingflate2
cratebz
: supportbzip2
files usingbzip2
cratelz
: supportlz4
files usinglz4
cratexz
: supportxz
files usingxz2
crate (requires xz library installed)zstd
: supportzst
files usingzstd
crate
json
: allow reading JSON content into structs withserde
andserde_json
TLS choice: rustls
or native-tls
Users can choose between rustls
or native-tls
as their TLS library. We use rustls
as the basic library.
Users can also choose to accept invalid certificates (not recommending) by setting ONEIO_ACCEPT_INVALID_CERTS=true
environment variable.
Optional features: cli
, s3
, digest
s3
: allow reading from AWS S3 compatible bucketscli
: build commandline programoneio
, uses the following featureslib-core
,rustls
,s3
for core functionalitiesclap
,tracing
for CLI basics
digest
for generating SHA256 digest string
Selecting some compression algorithms
Users can also manually opt-in to specific compression algorithms. For example, to work with only local gzip
and bzip2
files:
oneio = { version = "0.17", default-features = false, features = ["gz", "bz"] }
Use oneio
commandline tool
OneIO comes with a commandline tool, oneio
, that opens and reads local/remote files
to terminal and handles decompression automatically. This can be useful if you want to
read some compressed plain-text files from a local or remote source.
oneio reads files from local or remote locations with any compression
Usage: oneio [OPTIONS] [FILE] [COMMAND]
Commands:
s3 S3-related subcommands
digest Generate SHA256 digest
help Print this message or the help of the given subcommand(s)
Arguments:
[FILE] file to open, remote or local
Options:
-d, --download download the file to current directory, similar to run `wget`
-o, --outfile <OUTFILE> output file path
--cache-dir <CACHE_DIR> cache reading to specified directory
--cache-force force re-caching if local cache already exists
--cache-file <CACHE_FILE> specify cache file name
-s, --stats read through the file and only print out stats
-h, --help Print help
-V, --version Print version
You can specify a data file location after oneio
. The following command
prints out the raw HTML file from https://bgpkit.com.
oneio https://bgpkit.com
Here is another example of using oneio
to read a remote compressed JSON file,
pipe it to jq
and count the number of JSON objects in the array.
$ oneio https://data.bgpkit.com/peer-stats/as2rel-latest.json.bz2 | jq '.|length'
802861
You can also directly download a file with the --download
(or -d
) flag.
$ oneio -d https://archive.routeviews.org/route-views.amsix/bgpdata/2022.11/RIBS/rib.20221107.0400.bz2
file successfully downloaded to rib.20221107.0400.bz2
$ ls -lh rib.20221107.0400.bz2
-rw-r--r-- 1 mingwei staff 122M Nov 7 16:17 rib.20221107.0400.bz2
$ monocle parse rib.20221107.0400.bz2 |head -n5
A|1667793600|185.1.167.24|3214|0.0.0.0/0|3214 1299|IGP|185.1.167.24|0|0|3214:3001|NAG||
A|1667793600|80.249.211.155|61955|0.0.0.0/0|61955 50629|IGP|80.249.211.155|0|0||NAG||
A|1667793600|80.249.213.223|267613|0.0.0.0/0|267613 1299|IGP|80.249.213.223|0|0|5469:6000|NAG||
A|1667793600|185.1.167.62|212483|1.0.0.0/24|212483 13335|IGP|152.89.170.244|0|0|13335:10028 13335:19000 13335:20050 13335:20500 13335:20530 lg:212483:1:104|NAG|13335|108.162.243.9
A|1667793600|80.249.210.28|39120|1.0.0.0/24|39120 13335|IGP|80.249.210.28|0|0|13335:10020 13335:19020 13335:20050 13335:20500 13335:20530|AG|13335|141.101.65.254
Use OneIO Reader as a Library
The returned reader implements BufRead, and handles decompression from the following types:
gzip
: files ending withgz
orgzip
bzip2
: files ending withbz
orbz2
lz4
: files ending withlz4
orlz
xz
: files ending withxz
orxz2
zstd
: files ending withzst
orzstd
It also handles reading from remote or local files transparently.
Examples
Read all into string:
const TEST_TEXT: &str = "OneIO test file.
This is a test.";
let mut reader = oneio::get_reader("https://spaces.bgpkit.org/oneio/test_data.txt.gz").unwrap();
let mut text = "".to_string();
reader.read_to_string(&mut text).unwrap();
assert_eq!(text.as_str(), TEST_TEXT);
Read into lines:
use std::io::BufRead;
const TEST_TEXT: &str = "OneIO test file.
This is a test.";
let lines = oneio::read_lines("https://spaces.bgpkit.org/oneio/test_data.txt.gz").unwrap()
.map(|line| line.unwrap()).collect::<Vec<String>>();
assert_eq!(lines.len(), 2);
assert_eq!(lines[0].as_str(), "OneIO test file.");
assert_eq!(lines[1].as_str(), "This is a test.");
Use OneIO Writer as a Library
[get_writer] returns a generic writer that implements [Write], and handles decompression from the following types:
gzip
: files ending withgz
orgzip
bzip2
: files ending withbz
orbz2
Note: lz4 writer is not currently supported.
Example
Common IO operations
let to_read_file = "https://spaces.bgpkit.org/oneio/test_data.txt.gz";
let to_write_file = "/tmp/test_write.txt.bz2";
// read text from remote gzip file
let mut text = "".to_string();
oneio::get_reader(to_read_file).unwrap().read_to_string(&mut text).unwrap();
// write the same text to a local bz2 file
let mut writer = oneio::get_writer(to_write_file).unwrap();
writer.write_all(text.as_ref()).unwrap();
drop(writer);
// read from the newly generated bz2 file
let mut new_text = "".to_string();
oneio::get_reader(to_write_file).unwrap().read_to_string(&mut new_text).unwrap();
// compare the decompressed content of the remote and local files
assert_eq!(text.as_str(), new_text.as_str());
std::fs::remove_file(to_write_file).unwrap();
Read remote content with custom headers
use std::collections::HashMap;
use reqwest::header::HeaderMap;
let headers: HeaderMap = (&HashMap::from([("X-Custom-Auth-Key".to_string(), "TOKEN".to_string())]))
.try_into().expect("invalid headers");
let client = reqwest::blocking::Client::builder()
.default_headers(headers)
.danger_accept_invalid_certs(true)
.build().unwrap();
let mut reader = oneio::get_http_reader(
"https://SOME_REMOTE_RESOURCE_PROTECTED_BY_ACCESS_TOKEN",
Some(client),
).unwrap();
let mut text = "".to_string();
reader.read_to_string(&mut text).unwrap();
println!("{}", text);
Download remote file to local directory
oneio::download(
"https://data.ris.ripe.net/rrc18/2022.11/updates.20221107.2325.gz",
"updates.gz",
None
).unwrap();
S3-related operations (needs s3
feature flag)
use oneio::s3::*;
// upload to S3
s3_upload("oneio-test", "test/README.md", "README.md").unwrap();
// read directly from S3
let mut content = String::new();
s3_reader("oneio-test", "test/README.md")
.unwrap()
.read_to_string(&mut content)
.unwrap();
println!("{}", content);
// download from S3
s3_download("oneio-test", "test/README.md", "test/README-2.md").unwrap();
// get S3 file stats
let res = s3_stats("oneio-test", "test/README.md").unwrap();
dbg!(res);
// error if file does not exist
let res = s3_stats("oneio-test", "test/README___NON_EXISTS.md");
assert!(res.is_err());
// copy S3 file to a different location
let res = s3_copy("oneio-test", "test/README.md", "test/README-temporary.md");
assert!(res.is_ok());
assert_eq!(
true,
s3_exists("oneio-test", "test/README-temporary.md").unwrap()
);
// delete temporary copied S3 file
let res = s3_delete("oneio-test", "test/README-temporary.md");
assert!(res.is_ok());
assert_eq!(
false,
s3_exists("oneio-test", "test/README-temporary.md").unwrap()
);
// list S3 files
let res = s3_list("oneio-test", "test/", Some("/".to_string()), false).unwrap();
assert_eq!(
false,
s3_exists("oneio-test", "test/README___NON_EXISTS.md").unwrap()
);
assert_eq!(true, s3_exists("oneio-test", "test/README.md").unwrap());
Built with ❤️ by BGPKIT Team
License
MIT
Dependencies
~0.3–18MB
~287K SLoC