4 releases (breaking)
0.4.0 | Feb 1, 2024 |
---|---|
0.3.0 | Jan 31, 2024 |
0.2.0 | Jan 29, 2024 |
0.1.0 | Jan 25, 2024 |
#452 in Images
33 downloads per month
53KB
1.5K
SLoC
surya-rs
Rust implementation of surya, a multilingual document OCR toolkit. The implementation is based on a modified version of Segformer and OpenCV.
Please refer to the original project for more details on licensing of the weights.
Roadmap
This project is still in development, feel free to star and check back.
- model structure, segformer (for inference only)
- weights loading
- image input pre-processing
- heatmap and affinity map
- bboxes
- image splitting and stitching
- text recognition
- benchmark
- quantifications
How to build and install
Setup rust toolchain if you haven't yet:
# visit https://rustup.rs/ for more detailed information
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
Install llvm
and opencv
(example on Mac):
brew install llvm opencv
Build and install the binary:
# run this first on Mac if you have a M1 chip
export DYLD_FALLBACK_LIBRARY_PATH="$(xcode-select --print-path)/usr/lib/"
# run this first on other Mac
export DYLD_FALLBACK_LIBRARY_PATH="$(xcode-select --print-path)/Toolchains/XcodeDefault.xctoolchain/"
# optionally you can include features like accelerate, metal, mkl, etc.
cargo install --path . --features=cli
The binary when built does not include the weights file itself, and will instead download via the HuggingFace Hub API. Once downloaded, the weights file will be cached in the HuggingFace cache directory.
Check -h
for help:
Surya is a multilingual document OCR toolkit, original implementation in Python and PyTorch
Usage: surya [OPTIONS] <IMAGE>
Arguments:
<IMAGE> path to image
Options:
--batch-size <BATCH_SIZE>
detection batch size, if not supplied defaults to 2 on CPU and 16 on GPU
--model-repo <MODEL_REPO>
detection model's hugging face repo [default: vikp/line_detector]
--weights-file-name <WEIGHTS_FILE_NAME>
detection model's weights file name [default: model.safetensors]
--config-file-name <CONFIG_FILE_NAME>
detection model's config file name [default: config.json]
--non-max-suppression-threshold <NON_MAX_SUPPRESSION_THRESHOLD>
a value between 0.0 and 1.0 to filter low density part of heatmap [default: 0.35]
--extract-text-threshold <EXTRACT_TEXT_THRESHOLD>
a value between 0.0 and 1.0 to filter out bbox with low heatmap density [default: 0.6]
--bbox-area-threshold <BBOX_AREA_THRESHOLD>
a pixel threshold to filter out small area bbox [default: 10]
--polygons
whether to output polygons json file
--image
whether to generate bbox image
--heatmap
whether to generate heatmap
--affinity-map
whether to generate affinity map
--output-dir <OUTPUT_DIR>
output directory, under which the input image will be generating a subdirectory [default: ./surya_output]
--device <DEVICE_TYPE>
[default: cpu] [possible values: cpu, gpu, metal]
--verbose
whether to enable verbose mode
-h, --help
Print help
-V, --version
Print version
You can also use this to control logging level:
export SURYA_LOG=warn # or debug, warn, etc.
Library
This lib is also published as a trait for other rust projects to use.
Dependencies
~10–20MB
~277K SLoC