36 releases
0.7.10 | Sep 26, 2024 |
---|---|
0.7.7 | May 10, 2024 |
0.7.4 | Jan 17, 2024 |
0.7.2 | Sep 8, 2023 |
0.3.0 | Feb 5, 2018 |
#110 in Text processing
9,696 downloads per month
Used in 33 crates
(22 directly)
355KB
9K
SLoC
pdf-extract
A rust library to extract content from PDF files.
let bytes = std::fs::read("tests/docs/simple.pdf").unwrap();
let out = pdf_extract::extract_text_from_mem(&bytes).unwrap();
assert!(out.contains("This is a small demonstration"));
See also
- https://github.com/elacin/PDFExtract/
- https://github.com/euske/pdfminer
- https://github.com/CrossRef/pdfextract
- https://github.com/VikParuchuri/marker
- https://github.com/kermitt2/pdfalto used by grobid
Not PDF specific
Dependencies
~15MB
~229K SLoC