8 releases (4 breaking)
0.5.1 | Sep 6, 2022 |
---|---|
0.5.0 | Sep 6, 2022 |
0.4.0 | Jun 4, 2021 |
0.3.1 | Oct 26, 2020 |
0.1.0 | Jun 19, 2020 |
#94 in Compression
6,445 downloads per month
Used in 2 crates
64KB
976 lines
piz: A Parallel Implementation of Zip (in Rust)
piz is a Zip archive reader designed to decompress any number of files concurrently using a simple API:
// For smaller files,
//
// let bytes = fs::read("foo.zip")
// let archive = ZipArchive::new(&bytes)?;
//
// works just fine. Memory map larger files!
let zip_file = File::open("foo.zip")?;
let mapping = unsafe { Mmap::map(&zip_file)? };
let archive = ZipArchive::new(&mapping)?;
// We can iterate through the entries in the archive directly...
//
// for entry in archive.entries() {
// let mut reader = archive.read(entry)?;
// // Read away!
// }
//
// ...but ZIP doesn't guarantee that entries are in any particular order,
// that there aren't duplicates, that an entry has a valid file path, etc.
// Let's do some validation and organize them into a tree of files and folders.
let tree = as_tree(archive.entries())?;
// With that done, we can get a file (or directory)'s metadata from its path.
let metadata = tree.lookup("some/specific/file")?;
// And read the file out, if we'd like:
let mut reader = archive.read(metadata)?;
let mut save_to = File::create(&metadata.file_name)?;
io::copy(&mut reader, &mut save_to)?;
// Readers are `Send`, so we can read out as many as we'd like in parallel.
// Here we'll use Rayon to read out the whole archive with all cores:
tree.files()
.par_bridge()
.try_for_each(|entry| {
if let Some(parent) = entry.file_name.parent() {
// Create parent directories as needed.
fs::create_dir_all(parent)?;
}
let mut reader = archive.read(entry)?;
let mut save_to = File::create(&entry.file_name)?;
io::copy(&mut reader, &mut save_to)?;
Ok(())
})?;
Zip is an interesting archive format: unlike compressed tarballs often seen
in Linux land (*.tar.gz
, *.tar.zst
, ...),
each file in a Zip archive is compressed independently,
with a central directory telling us where to find it.
This allows us to extract multiple files simultaneously so long as we can
read from multiple places at once.
Users can either read the entire archive into memory, or, for larger archives, memory-map the file. (On 64-bit systems, this allows us to treat archives as a contiguous byte range even if the file is much larger than physical RAM. 32-bit systems are limited by address space to archives under 4 GB, but piz should be well-behaved if the archive is small enough.)
Examples
See unzip/
for a simple CLI example that unzips a provided file
into the current directory.
Tests
test_harness/
contains some smoke tests against a few inputs, e.g.:
- A basic, "Hello, Zip!" archive of a few text files
- The same, but with some junk prepended to it
- A Zip64 archive with files > 2^32 bytes
If it doesn't find these files, it creates them with a shell script (which assumes a Unix-y environment).
Future plans
Piz currently provides limited metadata for each file (path, size, CRC32, last-modified time, etc.). Additional info - like file permissions - should be added later. Support for compression algorithms besides DEFLATE (like Bzip2) could also be added.
Thanks
Many thanks to
-
Hans Wennborg for their fantastic article, Zip Files: History, Explanation and Implementation
-
Mathijs van de Nes's zip-rs, the main inspiration of this project and a great example of a Zip decoder in Rust
Dependencies
~1.7–2.8MB
~48K SLoC