#pdf2text #text #pdf #pdf2txt

pdf-extract

A library to extract content from pdfs

30 releases

0.7.4 Jan 17, 2024
0.7.2 Sep 8, 2023
0.6.5 Apr 24, 2023
0.6.4 May 9, 2022
0.3.0 Feb 5, 2018

#219 in Text processing

Download history 915/week @ 2023-11-04 977/week @ 2023-11-11 891/week @ 2023-11-18 966/week @ 2023-11-25 716/week @ 2023-12-02 720/week @ 2023-12-09 913/week @ 2023-12-16 610/week @ 2023-12-23 519/week @ 2023-12-30 692/week @ 2024-01-06 978/week @ 2024-01-13 1017/week @ 2024-01-20 979/week @ 2024-01-27 806/week @ 2024-02-03 1056/week @ 2024-02-10 1793/week @ 2024-02-17

4,768 downloads per month
Used in 13 crates (10 directly)

MIT license

350KB
9K SLoC

pdf-extract

Build Status crates.io Documentation

A rust library to extract content from PDF files.

let bytes = std::fs::read("tests/docs/simple.pdf").unwrap();
let out = pdf_extract::extract_text_from_mem(&bytes).unwrap();
assert!(out.contains("This is a small demonstration"));

See also

Dependencies

~17MB
~258K SLoC