1 unstable release
Uses old Rust 2015
0.1.2 | Dec 5, 2022 |
---|
#12 in #coming
Used in textract
1.5MB
540 lines
Document File Text Extractor
Simple Rust library to extract readable text from specific document format like Word Document (docx). Currently only support several format, other format coming soon.
Supported Document
- Microsoft Word (docx)
- Microsoft Excel (xlsx)
- Microsoft Power Point (pptx)
- OpenOffice Writer (odt)
- OpenOffice Spreadsheet (ods)
- OpenDocument Presentation (odp)
Usage
let mut file = Docx::open("samples/sample.docx").unwrap();
let mut isi = String::new();
let _ = file.read_to_string(&mut isi);
println!("CONTENT:");
println!("----------BEGIN----------");
println!("{}", isi);
println!("----------EOF----------");
Test
$ cargo test
or run example:
$ cargo run --example readdocx data/sample.docx
[] Robin Sy.
Dependencies
~8–14MB
~246K SLoC