#nlp #dictionary #parser-combinator #cantonese

wordshk_tools

A combination of parsers and other tools for words.hk (粵典)

90 releases (7 stable)

3.16.0-beta.9 Aug 27, 2023
3.15.8 Jun 25, 2023
3.15.4 Dec 24, 2022
3.15.3-beta Jul 31, 2022
0.1.0 Dec 20, 2021

#289 in Text processing

32 downloads per month

MIT license

8.5MB
9K SLoC

wordshk-tools

A combination of tools for words.hk (粵典).

Parser

/// Parse the whole words.hk CSV database into a [Dict]
pub fn parse_dict() -> Result<Dict, Box<dyn Error>>

Located at /src/lib.rs

Parses all entries marked with OK and store the results as a list of entries. This parser is the very core of this library because its output is used by other functions like to_apple_dict. To boost efficiency, no regular expressions and backtracking are used. It is powered by a library called lip (written by myself) that provides flexible parser combinators and supports friendly error messages.

Example Usages

  1. Parse words.hk dictionary and extract useful information
    • See examples/parse_dict for more details
  2. Export to Apple Dictionary
    • See examples/export_apple_dict for more details
  3. Search words.hk
    • See examples/benchmark_search for more details

Source

The full up-to-date CSV database of words.hk dictionary can be downloaded from words.hk. You can request access to the CSV using this link: https://words.hk/faiman/request_data/

License

MIT

Dependencies

~18–34MB
~504K SLoC