6 releases

0.1.5 Mar 6, 2024
0.1.4 Oct 8, 2023
0.1.3 Sep 4, 2023
0.1.2 Aug 3, 2023
0.1.0 Mar 8, 2022

#139 in Text processing

Download history 1071/week @ 2024-08-10 1236/week @ 2024-08-17 1201/week @ 2024-08-24 1449/week @ 2024-08-31 1227/week @ 2024-09-07 1232/week @ 2024-09-14 1729/week @ 2024-09-21 1508/week @ 2024-09-28 1506/week @ 2024-10-05 1793/week @ 2024-10-12 2513/week @ 2024-10-19 1845/week @ 2024-10-26 1982/week @ 2024-11-02 2016/week @ 2024-11-09 1972/week @ 2024-11-16 2143/week @ 2024-11-23

8,429 downloads per month
Used in 41 crates (2 directly)

MIT/Apache

1MB
657 lines

hypher

Crates.io Documentation

hypher separates words into syllables.

[dependencies]
hypher = "0.1"

Features

  • All-inclusive: Hyphenation patterns are embedded into the binary as efficiently encoded finite automata at build time.
  • Zero load time: Hyphenation automata operate directly over the embedded binary data with no up-front decoding.
  • No allocations unless when hyphenating very long words (> 41 bytes). You can disable the alloc feature, but then overly long words lead to a panic.
  • Support for many languages.
  • No unsafe code, no dependencies, no std.

Example

use hypher::{hyphenate, Lang};

let syllables = hyphenate("extensive", Lang::English);
assert_eq!(syllables.join("-"), "ex-ten-sive");

Languages

By default, this crate supports hyphenating more than 30 languages. Embedding automata for all these languages will add ~1.1 MiB to your binary. Alternatively, you can disable support for all languages and manually choose which ones get added:

[dependencies]
hypher = { version = "0.1", default-features = false, features = ["english", "greek"] }

Each language added individually contributes:

Language Space
Afrikaans 60 KiB
Albanian 1.4 KiB
Belarusian 3.9 KiB
Bulgarian 13 KiB
Catalan 1.7 KiB
Croatian 2.0 KiB
Czech 40 KiB
Danish 5.7 KiB
Dutch 63 KiB
English 27 KiB
Estonian 19 KiB
Finnish 1.3 KiB
French 6.9 KiB
Georgian 11 KiB
German 192 KiB
Greek 2.0 KiB
Hungarian 346 KiB
Icelandic 21 KiB
Italian 1.6 KiB
Kurmanji 1.4 KiB
Latin 1003 B
Lithuanian 6.5 KiB
Mongolian 4.9 KiB
Norwegian 153 KiB
Polish 16 KiB
Portuguese 343 B
Russian 33 KiB
Serbian 13 KiB
Slovak 13 KiB
Slovenian 5.5 KiB
Spanish 14 KiB
Swedish 24 KiB
Turkish 526 B
Turkmen 1.4 KiB
Ukrainian 21 KiB

Benchmarks

Task hypher hyphenation
Hyphenating extensive (english) 356ns 698ns
Hyphenating διαμερίσματα (greek) 503ns 1121ns
Loading the english patterns 0us 151us
Loading the greek patterns 0us 0.826us

Benchmarks were executed on ARM, Apple M1.

License

The code of this crate is dual-licensed under the MIT and Apache 2.0 licenses.

The files in patterns/ are subject to the individual licenses stated therein. The patterns are processed at build time and then embedded (i.e. statically linked) into your binary. However, hypher includes only patterns that are available under permissive licenses. Patterns licenses include the LPPL, MPL, MIT, BSD-3.

No runtime deps

Features