#file-compression #read-write #reading #reader #write-file #path

compress_io

Convenience library for reading and writing compressed files/streams

4 releases (2 breaking)

0.5.0 Apr 11, 2022
0.4.0 Apr 6, 2022
0.2.1 Dec 6, 2021
0.2.0 Dec 6, 2021

#616 in Compression

33 downloads per month

Custom license

62KB
1K SLoC

compress_io

Convenience library for reading and writing compressed files/streams

The aim of compress_io is to make it simple for an application to support multiple compression formats with a minimal effort from the developer and also from the user (i.e., an application can accept uncompressed or compressed input in a range of different formats and neither the developer nor the user have to specify which formats have been used). compress_io does not provide the compression/decompression itself but uses external utilities such as gzip, bzip2 or zstd as read or write filters.


lib.rs:

Convenience library for reading and writing compressed files / streams

compress_io`` does not provide the compression/decompression itself but uses external utilities such as [gzip], [bzip2] or [zstd] as read or write filters. The aim of compress_io` is to make it simple for an application to support multiple compression formats with a minimal effort from the developer and also from the user (i.e., an application can accept uncompressed or compressed input in a range of different formats and neither the developer nor the user have to specify which formats have been used).

Overview

The main way to work with compress_io is via CompressIo (or AsyncCompressIo in the case of async code). A reader (implementing Read), buffered reader (implementing BufRead), writer or buffered writer (both implementing Write) can be generated from CompressIo (or AsyncCompressIo). By default readers and writers use stdin and stdout, but a file path can also be specified with path. By default compress_io will detect the compression format of compressed input files automatically based on the initial contents of the file/stream and select an appropriate utility if available in the users $PATH, and the format of output files based on the file extension. These automatic methods can be overridden by ctype. compress_io will make use of parallel versions of compression utilities if available. By default the compression utilities will be run using with the default threading options, but this behvaiour can be changed using cthreads.

Examples

use std::io::{self, BufRead, Write};
use compress_io::compress::CompressIo;

fn main() -> io::Result<()> {
  // Read from a (presumably) gzipped file foo.gz and write out to file `foo.xz` which will be
  // compressed using [xz] (assuming both [gzip] and [xz] are in the users Path.
  // In this example both read and write streams are buffered
  let mut reader = CompressIo::new().path("foo.gz").bufreader()?;
  let mut writer = CompressIo::new().path("foo.xz").bufwriter()?;
  for s in reader.lines().map(|l| l.expect("Read error")) {
    writeln!(writer, "{}", s)?
  }
  Ok(())
}

Decompression utilities can be specified by the user, or can be selected automatically based on an examination of the first few bytes of the input.

use compress_io::{
  compress::CompressIo,
  compress_type::CompressType,
};

// Open a reader from `stdin`, using the first bytes from the file to determine whether the
// file is compressed or not
let mut rd1 = CompressIo::new().reader()?;
// Open a buffered reader from `foo.bz2` using [bzip2] to decompress
let mut rd2 = CompressIo::new().path("foo.bz2").ctype(CompressType::Bzip2).bufreader()?;

Compression utilities can also either be explicitly selected, or they can be set automatically based on the file name (so a file called test.zst would be compressed using the zstd utility). If the compression format is selected explicitly then extension will be added to the filename unless the extension is already present, or the fix_path option has been selected.

use compress_io::{
  compress::CompressIo,
  compress_type::CompressType,
};

// Open a compressed writer to `stdout`, using [zstd] to compress the stream
let mut wrt1 = CompressIo::new().ctype(CompressType::Zstd).writer()?;
// Open a compressed buffered writer to the file `foo.lzma` using lzma to decompress
let mut wrt2 = CompressIo::new().path("foo").ctype(CompressType::Lzma).bufwriter()?;

Several of the possible compression formats can be generated by multiple utilities, and this allows alternate utilities to be used if the standard utility is not available.

For example, the standard utility for xz compression is the xz tool, however zstd can also perform xz compression and will be substituted by the library if xz is not available. Note the if bgzip compression is requested then only the bgzip utility will be used; even though bgzip compression is compatible with the gzip format and can be decoded by any compressor that handles gzip, extra information is added during compression by bgzip that other utilities do not generate.

For compression, certain of the utilities are multi-threaded. If multiple utilities are available to perform a given compression type, preference will be given to multi-threaded versions. For example, if gzip compression is requested and the pigz utility is available in the current $PATH then this will be used in favour gzip. For compression the user can specify a preference for threading (where available) using cthreads.

use compress_io::{
  compress::CompressIo,
  compress_type::{CompressType, CompressThreads},
};

// Open a compressed buffered writer to `foo.zstd`, using [zstd] to compress the stream
// using 4 threads
let mut wrt = CompressIo::new().ctype(CompressType::Zstd)
  .cthreads(CompressThreads::Set(4)).bufwriter()?;

Usage

For usage with synchronous code only, add compress_io as a dependency in your Cargo.toml to use from crates.io:

[dependencies ]
compress_io = "0.2"

For use with asynchronous code then the async feature should be enabled:

[dependencies ]
compress_io = { version = "0.2", features = ["async"] }

Dependencies

~2–13MB
~138K SLoC