#tesseract #ocr #leptonica #text-recognition #computer-vision

tesseract-static

STATICALLY LINKED tesseract + leptonica bindings for easy inclusion of tesseract-ocr in binary applications

2 unstable releases

0.2.1 Jan 25, 2025
0.1.0 Mar 16, 2023

#247 in Images

Download history 2/week @ 2024-12-07 113/week @ 2025-01-25 11/week @ 2025-02-01 1/week @ 2025-02-08

125 downloads per month

Apache-2.0

75KB
1.5K SLoC

tesseract-static

Rust crate to automatically link and build tesseract and leptonica into a Rust program.

The crate includes the leptonica.dll / tesseract.dll at compile time, so these binaries will be automatically shipped with a build.

To rebuild tesseract / leptonica from source, cd rebuild-tesseract && cargo run --release then copy the output from /target/release/leptonica.dll and /target/release/tesseract.dll.

Usage

Run cargo run --release --example simple for an example:

# Cargo.toml
[dependencies]
tesseract-static = { version = "0.2.0", features = ["parse"] }
use tesseract_static::tesseract::Tesseract;

const TRAINING_DATA: &[u8] = include_bytes!("./eng.traineddata");

fn main() {

    // setup training data
    let parent = std::env::temp_dir();
    std::fs::write(&parent.join("eng.traineddata"), &TRAINING_DATA[..]).unwrap();

    let hocr_xml = Tesseract::new(Some(&parent.display().to_string()), Some("eng"))
    .unwrap()
    .set_image_from_mem(include_bytes!("../../testocr.pnm"))
    .unwrap()
    .get_hocr_text(1)
    .unwrap();

    let hocr = tesseract_static::parse::ParsedHocr::new(&hocr_xml).unwrap();

    println!("{hocr:#?}"); // parsed hOCR text from tesseract, includes rect bounds
}

License

MIT

Dependencies

~2.4–9MB
~93K SLoC