#document #text-document #readable #extract #format #docx #word

dotext

Simple Rust library to extract readable text from specific document format like Word Document (docx). Currently only support several format, other format coming soon.

2 releases

Uses old Rust 2015

0.1.1 Dec 3, 2017
0.1.0 Dec 3, 2017

#14 in #docx

Download history 63/week @ 2024-06-03 88/week @ 2024-06-10 71/week @ 2024-06-17 93/week @ 2024-06-24 109/week @ 2024-07-01 151/week @ 2024-07-08 164/week @ 2024-07-15 102/week @ 2024-07-22 128/week @ 2024-07-29 122/week @ 2024-08-05 173/week @ 2024-08-12 143/week @ 2024-08-19 144/week @ 2024-08-26 101/week @ 2024-09-02 130/week @ 2024-09-09 109/week @ 2024-09-16

511 downloads per month

MIT license

1.5MB
467 lines

Contains (Zip file, 98KB) samples/sample.xlsx

Document File Text Extractor

Build Status

Simple Rust library to extract readable text from specific document format like Word Document (docx). Currently only support several format, other format coming soon.

Supported Document

  • Microsoft Word (docx)
  • Microsoft Excel (xlsx)
  • Microsoft Power Point (pptx)
  • OpenOffice Writer (odt)
  • OpenDocument Presentation (odp)
  • PDF

Usage

let mut file = Docx::open("data/sample.docx").unwrap();
let mut isi = String::new();
let _ = file.read_to_string(&mut isi);
println!("CONTENT:");
println!("----------BEGIN----------");
println!("{}", isi);
println!("----------EOF----------");

Test

$ cargo test

or run example:

$ cargo run --example readdocx data/sample.docx

[] Robin Sy.

Dependencies

~8–14MB
~246K SLoC