#arabic #text #justification

kashida

Insert Kashidas/Tatweel into Arabic text, e.g. for justification purposes.

7 releases

0.0.7 May 19, 2024
0.0.6 May 16, 2024

#406 in Text processing

Download history 111/week @ 2024-05-05 386/week @ 2024-05-12 179/week @ 2024-05-19

676 downloads per month

MIT license

22KB
446 lines

Crates.io Version docs.rs

Kashida

If you want to justify Arabic (or Syriac, or any other connected script) text, you eventually need to insert kashidas (Unicode character U+640, or ـ ) between letters. This mini-crate does a job at giving you hopefully decent looking candidates. Logic for Arabic is based loosely on the Microsoft discussion here. Syriac is based on this document.

The main entry point of the library is a find_kashidas. You give it a string, and it gives a sorted, by priority, list of Kashida location candidates, in byte index. Perfecty usable with String::insert, or the convenience function provided of place_kashidas. There is no verification done on whether the string is truly the script you say it is or not. It works for voweled texts fine.

Oh it is no_std as well.

The Script enum has Arabic, Syriac, and Unknown. Arabic and Syriac have custom rules and priorities, but if you use the Unknown variant you'd get a generic set of rules that should, in theory, work for other scripts. If you can read and contribute these other scripts, help would be most welcome.

I tried to add a couple of C FFI functions, with help from Rust Discord. However, I don't understand C enough to know how to use them. If you can try them and let me know how to improve them, it would be very helpful.

Dependencies

~5.5MB
~88K SLoC