#leptonica #tesseract #ocr #image

leptess

Productive Rust binding for Tesseract and Leptonica

24 releases (12 breaking)

Uses old Rust 2015

0.13.1 Aug 31, 2021
0.13.0 May 10, 2021
0.12.0 May 2, 2021
0.11.0 Feb 16, 2021
0.6.0 Jul 6, 2019

#25 in Images

Download history 83/week @ 2021-06-06 43/week @ 2021-06-13 61/week @ 2021-06-20 46/week @ 2021-06-27 68/week @ 2021-07-04 52/week @ 2021-07-11 73/week @ 2021-07-18 66/week @ 2021-07-25 114/week @ 2021-08-01 63/week @ 2021-08-08 74/week @ 2021-08-15 62/week @ 2021-08-22 64/week @ 2021-08-29 35/week @ 2021-09-05 46/week @ 2021-09-12 43/week @ 2021-09-19

278 downloads per month
Used in retrochoir

MIT license

2MB
1.5K SLoC

Leptess

Test Crates.io Docs

Productive and safe Rust bindings/wrappers for Tesseract and Leptonica.

Build dependencies

Make sure you have clang, Leptonica and Tesseract installed.

Tesseract should be version 4.0.0 or above.

Ubuntu

sudo apt-get install libleptonica-dev libtesseract-dev clang

You will also need to install tesseract language data based on your OCR needs:

sudo apt-get install tesseract-ocr-eng

Mac

brew install tesseract leptonica

Windows

On Windows, this library uses Microsoft's vcpkg to provide tesseract.

Please install vcpkg and set up user wide integration or vcpkg crate won't be able to find the library.

To install tesseract:

REM from the vcpkg directory

REM 32 bit
.\vcpkg install tesseract:x86-windows

REM 64 bit
.\vcpkg install tesseract:x64-windows

To run the tests configure vcpkg-crate to find the tesseract library:

SET VCPKGRS_DYNAMIC=true
cargo test

Usage

let mut lt = leptess::LepTess::new(None, "eng").unwrap();
lt.set_image("path/to/page.bmp");
println!("{}", lt.get_utf8_text().unwrap());

For more examples, see docs and examples directory.

To run demos in examples directory, try:

cargo run --example low_level_ocr_full_page

Development

To run tests, you will need at Tesseract 4.x to match what we have in tests/tessdata/eng.traineddata. See CircleCI config to see how to replicate the setup.

Dependencies