9 releases
Uses new Rust 2024
| 0.2.0 | Nov 20, 2025 |
|---|---|
| 0.1.3 | Oct 31, 2025 |
| 0.1.1 | Sep 30, 2025 |
| 0.0.2 | Aug 6, 2025 |
| 0.0.1-alpha | Apr 5, 2025 |
#848 in Encoding
42KB
650 lines
Wacksy
An experimental Rust library for reading and writing ᴡᴀᴄᴢ files.
Install
With cargo installed, run the following command in your project directory:
cargo add wacksy
Example
This library provides two main ᴀᴘɪ functions.
from_file() takes a ᴡᴀʀᴄ file and returns a structured representation of a ᴡᴀᴄᴢ object.
as_zip_archive() takes a ᴡᴀᴄᴢ object and zips it up to a byte array using rawzip.
fn main() -> Result<(), Box<dyn Error>> {
let warc_file_path = Path::new("example.warc.gz"); // set path to your ᴡᴀʀᴄ file
let wacz_object = WACZ::from_file(warc_file_path)?; // index the ᴡᴀʀᴄ and create a ᴡᴀᴄᴢ object
let zipped_wacz: Vec<u8> = wacz_object.as_zip_archive()?; // zip up the ᴡᴀᴄᴢ
fs::write("example.wacz", zipped_wacz)?; // write out to file
Ok(())
}
See the documentation for more details.
Background
According to Ed Summers, a ᴡᴀᴄᴢ file is "really just a ᴢɪᴘ file that contains ᴡᴀʀᴄ data and metadata at predicatble file locations."[^code4lib_talk]
The example in the spec outlines what a ᴡᴀᴄᴢ file should contain:
archive
└── data.warc.gz
datapackage.json
datapackage-digest.json
indexes
└── index.cdx.gz
pages
└── pages.jsonl
[^code4lib_talk]: For more discussion of the concept, see the talk "Web Archives in Digital Repositories" by Ilya Kremer and Ed Summers at Code4Lib 2022.
Similar libraries
License
MIT © Bodleian Libraries and contributors
Dependencies
~2.5MB
~47K SLoC