#unicode #emoji #linebreak

xi-unicode

Unicode utilities useful for text editing, including a line breaking iterator

5 releases (3 breaking)

0.3.0 Oct 25, 2020
0.2.1 Jun 19, 2020
0.2.0 Jun 29, 2019
0.1.0 Jan 9, 2017
0.0.1 Apr 29, 2016

#91 in Text processing

Download history 22793/week @ 2023-08-03 24114/week @ 2023-08-10 22856/week @ 2023-08-17 18121/week @ 2023-08-24 15102/week @ 2023-08-31 18508/week @ 2023-09-07 19945/week @ 2023-09-14 18766/week @ 2023-09-21 20806/week @ 2023-09-28 22629/week @ 2023-10-05 29298/week @ 2023-10-12 21817/week @ 2023-10-19 31269/week @ 2023-10-26 28013/week @ 2023-11-02 24322/week @ 2023-11-09 27268/week @ 2023-11-16

114,502 downloads per month
Used in 570 crates (18 directly)

Apache-2.0

175KB
2.5K SLoC

Rust 2K SLoC // 0.1% comments Python 261 SLoC // 0.3% comments C++ 112 SLoC // 0.1% comments

xi-unicode

This crate contains unicode utilites adapted for working with non-contiguous bytes (such as a rope.)

Much of the contents of this repo are generated automatically by scripts from Unicode data files.

This current file is the result of some archaeology; documentation on how to rebuild the various files was missing, and I am attempting to reconstruct it.

data

Constructing the various tables require the various data files. These are available through the components of the Unicode standard directory, for a given unicode version. In particular, we require LineBreak.txt.

This file should be placed in a directory: I use data.

  • src/tables.rs is generated with the script located at tools/mk_tables.py, and can be built with,

    $ python3 tools/mk_tables.py data > src/tables.rs
    

    where data is the path to the created data directory.

  • the unit tests in src/lib.rs are also generated by this script, by passing the --tests and --tests-str flags (separately, to separate invocations) of the script, and then copying the output over into the body of these tests.

No runtime deps