8 releases
0.0.8 | May 21, 2024 |
---|---|
0.0.7 | May 21, 2024 |
#1282 in Text processing
89KB
66 lines
ocrmypdf-rs
A Rust library that adds “layers” of text to images in PDFs, making scanned image PDFs searchable using ocrmypdf, which is a Python application and library.
Prerequisites
For everything to work correctly, you need to have it installed on your OS ocrmypdf.
Example
Debian or Ubuntu users can simply use the following:
sudo apt install ocrmypdf
For more information on how to install on different OS, see the installation documents.
Installation
Install ocrmypdf-rs with cargo;
[dependencies]
ocrmypdf-rs = "0.0.7"
Usage/Examples
Basic example ref.
use ocrmypdf_rs::{Ocr, OcrMyPdf};
fn main() {
let mut ocr = OcrMyPdf::new(None, None, None);
ocr.set_input_path("input.pdf".into())
.set_output_path("output.pdf".into())
.set_args(vec!["--force-ocr".into()])
.execute();
}
new method
When instantiating the OcrMyPdf
structure it is possible to pass the following parameters:
args: Option<Vec<String>>
see about arguments in documentationinput_path: Option<String>
input pdf pathoutput_path: Option<String>
output pdf path
[!TIP] 💡 If the input_path or output_path fields are provided, there is no need to provide them at runtime.
use ocrmypdf_rs::{Ocr, OcrMyPdf};
fn main() {
let args: Vec<String> = vec!["-l por".into()];
let input_path = "input.pdf";
let output_path = "output.pdf";
let mut ocr = OcrMyPdf::new(
Some(args),
Some(input_path.into()),
Some(output_path.into()),
);
ocr.execute();
}
[!NOTE] The
-l por
args to work requires the additional selected language to be installed, see how install;
Dependencies
~1.5MB
~39K SLoC