41 releases
new 0.8.2 | Feb 9, 2025 |
---|---|
0.8.0 | Jan 5, 2025 |
0.7.12 | Dec 28, 2024 |
0.7.10 | Sep 26, 2024 |
0.3.0 | Feb 5, 2018 |
#97 in Text processing
44,310 downloads per month
Used in 34 crates
(23 directly)
360KB
9K
SLoC
pdf-extract
A rust library to extract content from PDF files.
let bytes = std::fs::read("tests/docs/simple.pdf").unwrap();
let out = pdf_extract::extract_text_from_mem(&bytes).unwrap();
assert!(out.contains("This is a small demonstration"));
See also
- https://github.com/elacin/PDFExtract/
- https://github.com/euske/pdfminer / https://github.com/pdfminer/pdfminer.six
- https://gitlab.com/crossref/pdfextract
- https://github.com/VikParuchuri/marker
- https://github.com/kermitt2/pdfalto used by grobid
- https://github.com/opendatalab/MinerU (uses PyMuPDF and pdfminer.six)
Not PDF specific
Dependencies
~17MB
~258K SLoC