2 releases
0.0.1 | Jul 12, 2024 |
---|---|
0.0.0 | Jul 11, 2024 |
#52 in #speech
60KB
1K
SLoC
Natural Language Syntax Highlighting
Natural-Syntax-LS is a language server that highlights different parts of speech (POS) in plain text.
Installation
-
Download
libtorch
v2.1 as per Rust-BERT's documentation.Tips.
You can figure out the URL to download
libtorch
in tch-rs' build script. TheLIBTORCH
variable should be thetorch/
directory.Why automatic installation does not work.
Rust-BERT has an "automatic installation" option that uses tch-rs' build script to download
libtorch
. However, the binary produced this way does not run because thatlibtorch
is not onLD_LIBRARY_PATH
. Alternatively, you could statically linklibtorch
, but that would require you to downloadlibtorch
yourself anyway. -
Install the
natural_syntax_ls
package with Cargo or friends to get thenatural-syntax-ls
binary:cargo install natural_syntax_ls --default-features=false
Setting the
default-features
tofalse
disables downloadinglibtorch
(automatic installation).Why automatic installation is the default.
Because otherwise it would be a pain to run the continuous integration.
Editor setup
✅ NeoVim setup with LSPConfig
Please paste the below natural_syntax_ls_setup
function in
your Nvim configuration and call it with your client's capabilities
.
Please see my config for an
example.
The natural_syntax_ls_setup
function.
local function natural_syntax_ls_setup(capabilities)
local lspconfig = require('lspconfig')
require('lspconfig.configs')['natural_syntax_ls'] = {
default_config = {
cmd = { 'natural-syntax-ls' },
filetypes = { 'text' },
single_file_support = true,
},
docs = {
description = [[The Natural Syntax Language Server for highlighting parts of speech.]],
},
}
lspconfig['natural_syntax_ls'].setup {
capabilities,
init_options = {
token_map_update = {
-- Customize your POS-token mapping here. E.g.:
--[[
-- Disable coordinating conjunctions highlighting.
CC = vim.NIL, -- `nil` does not work because it gets ignored.
-- Highlight wh-determiners as enum members without any modifiers.
WDT = { type = "enumMember" },
-- Highlight determiners as read-only classes.
DT = { type = "class", modifiers = { "readonly" } },
]]
},
},
}
end
Customizations:
- I only set the
filetypes
field totext
, but you can enable natural-syntax-ls for any other file types as well. Note that, though, the language server's semantic tokens supersede Tree-sitter highlighting by default. - By specifying the
token_map_update
field ininit_options
, you can customize the mapping between parts of speech and semantic tokens.- The default mapping is in the
pos2token_bits
function insemantic_tokens.rs
. - Part of speech tags are the variants of the
PartOfSpeech
enum inlib.rs
. - Token types and modifiers are variants of
TokenType
andTokenModifier
insemantic_tokens.rs
, all in camelCase.
- The default mapping is in the
❓ Visual Studio Code and other editor setup
No official support, but community plugins are welcome.
I do not currently use VSCode and these other editors, so I do not wish to maintain plugins for them.
However, it should be straightforward to implement plugins for them since Natural-Syntax-LS implements the Language Server Protocol (LSP). So, please feel free to make a plugin yourself and create an issue for me to link it here.
Selected specification
Prediction Scheduling
For a single document, only one prediction is scheduled at a time. When a prediction is ongoing, new updates are queued and the latest update replaces any previous updates queued.
Debugging
We use tracing-subscriber
with the env-filter
feature to
emit logs^tracing-env-filter.
Please configure the log level by setting the RUST_LOG
environment variable.
On macOS, you may need to set DYLD_LIBRARY_PATH
to run the tests.
Future work
- Customizing the mapping between part of speech and semantic token.
- Support languages other than English. This simply requires a new model.
- Incremental updates and semantic token ranges.
- Do not overwrite Markdown/LaTeX syntax highlighting.
Dependencies
~44MB
~761K SLoC