#pdf #poppler #text

pdftotext

High-level library that binds to Poppler to extract text from a PDF

6 releases

0.1.5 Dec 16, 2020
0.1.4 Dec 16, 2020

#1901 in Text processing

29 downloads per month

GPL-2.0 OR GPL-3.0

7MB
157K SLoC

C++ 133K SLoC // 0.1% comments C 24K SLoC // 0.0% comments Python 318 SLoC // 0.3% comments Rust 140 SLoC

pdftotext

This crate extracts Poppler's pdftotext -layout code into a library, linking dynamically to system's Poppler.

The library was tested with Poppler 20.12.1. It calls popper's internal APIs so it may break with future library versions. If this is a concern, build with static-poppler enabled, which statically links vendored Poppler 20.12.1.

No runtime deps