7 releases (breaking)

0.6.0 May 2, 2022
0.5.0 Mar 26, 2022
0.4.0 Nov 6, 2021
0.3.0 Nov 6, 2021
0.1.1 Apr 4, 2021

#869 in Text processing

38 downloads per month
Used in 3 crates (2 directly)

MIT license

24KB
388 lines

lsp-document

Docs

Helpers to convert between LSP representations of text documents and Rust strings.

TL;DR:

LSP uses UTF16-encoded strings while Rust's strings are UTF8-encoded. This means that text offsets in LSP and in Rust are different:

  • LSP offsets are in 16-bit code-units and each character is either 1 or 2 of those,
  • Rust strings are indexed in bytes and each character takes from 1 to 4 bytes.

To ensure that LSP client and server "talk" about the same part of a text document we need a translation layer.

This crate provides such a layer.

Example usage

See the docs for more details.

use lsp_document::{TextMap, TextAdapter, Pos, IndexedText};
use lsp_types::Position;

// Character width
// U16:     1111111111111 1111111111 1 11 1 1 111111111 21
// U8:      1111111111111 1222122221 1 13 3 3 111111111 41
// U8 offset
//          0         1       2      3       4          5
//          0123456789012 3468013579 0 12 5 8 123456789 04
let text = "Hello, world!\nКак дела?\r\n做得好\nThis is 💣!";
let text = IndexedText::new(text);
//
// Examples of using TextMap methods
//
// Pos of 💣 from its offset
assert_eq!(text.offset_to_pos(50).unwrap(), Pos::new(3, 8));
// Raw line range info
assert_eq!(text.line_range(2).unwrap(), Pos::new(2, 0)..Pos::new(2, 10));
// Extracting part of text between two positions
assert_eq!(text.substr(Pos::new(1, 7)..Pos::new(1, 15)).unwrap(), "дела");

//
// Example of using TextAdapter methods
//
// Pos of `!` after 💣
assert_eq!(text.lsp_pos_to_pos(&Position::new(3, 10)).unwrap(), Pos::new(3, 12));
assert_eq!(text.pos_to_lsp_pos(&Pos::new(3, 12)).unwrap(), Position::new(3, 10));

Using Strings for text manipulation

Currently, the crate works for str-like representation of text. UTF8-encoded strings are efficiently packed in memory, which means:

  1. 👍 There's low memory overhead of storing these strings.
  2. 👍 The contents is contiguous in memory, hence random access and iteration over chars are fast (the latter is important for conversion between lsp and native positions).
  3. 👎 Making changes to strings is slow as it requires time proportional to the length of the string.

Most likely, the performance impact of 3. won't be a problem as we query data much more often than we change it (in the context of LSP servers). So, using Strings should be just fine for a lot of applications.

However, having an implementation backed by a Rope wouldn't hurt (although this is not a priority at the moment).

Dependencies

~2.4–3.5MB
~100K SLoC