1 unstable release
0.1.0 | Feb 18, 2024 |
---|
#884 in Text processing
23,434 downloads per month
Used in 8 crates
(4 directly)
9KB
69 lines
ANSI width
Measure the width of a string when printed to the terminal
For ASCII, this is identical to the length of the string in bytes. However, there are 2 special cases:
- Many unicode characters (CJK, emoji, etc.) span multiple columns.
- ANSI escape codes should be ignored.
The first case is handled by the unicode-width
crate. This function extends
that crate by ignoring ANSI escape codes.
Limitations
- We cannot know the width of a
TAB
character in the terminal emulator. - Backspace is also treated as zero width.
A Primer on ANSI escape codes (and how this crate works)
ANSI codes are created using special character sequences in a string. These
sequences start with the ESC character: '\x1b'
, followed by some other
character to determine the type of the escape code. That second character
determines how long the sequence continues:
ESC [
: until a character in the range'\x40'..='\x7E'
is found.ESC ]
: until anST
is found.
An ST
is a String Terminator and is given by the sequence ESC \
(or in Rust
syntax '\x1b\x5c'
).
This is the subset of sequences that this library supports, since these are used by most applications that need this functionality. If you have a use case for other codes, please open an issue on the GitHub repository.
ansi-width
does not parse the actual ANSI codes to improve performance, it can
only skip the ANSI codes.
Examples
use ansi_width::ansi_width;
// ASCII string
assert_eq!(ansi_width("123456"), 6);
// Accents
assert_eq!(ansi_width("café"), 4);
// Emoji (2 crab emoji)
assert_eq!(ansi_width("🦀🦀"), 4);
// CJK characters (“Nǐ hǎo” or “Hello” in Chinese)
assert_eq!(ansi_width("你好"), 4);
// ANSI colors
assert_eq!(ansi_width("\u{1b}[31mRed\u{1b}[0m"), 3);
// ANSI hyperlink
assert_eq!(
ansi_width("\x1b]8;;http://example.com\x1b\\This is a link\x1b]8;;\x1b\\"),
14
);
Alternatives
str::len
: Returns only the length in bytes and therefore only works for ASCII characters.unicode-width
: Does not take ANSI characters into account by design (see this issue). This might be what you want if you don't care about ANSI codes.unicode-width
is used internally by this crate as well.textwrap::core::display_width
: Very similar functionality to this crate and it also supports hyperlinks since version 0.16.1. The advantage of this crate is that it does not require pulling in the rest oftextwrap
's functionality (even though that functionality is excellent if you need it).console::measure_text_width
: Similar totextwrap
and very well-tested. However, it constructs a new string internally without ANSI codes first and then measures the width of that. The parsing is more robust than this crate though.
References
The information above is based on:
Dependencies
~1.5MB
~19K SLoC