#syntax-highlighting #tree-sitter #highlighting #syntax #highlight #io-write

inkjet

A batteries-included syntax highlighting library for Rust, based on tree-sitter

18 releases

0.11.1 Sep 15, 2024
0.10.5 Apr 27, 2024
0.10.3 Jan 29, 2024
0.10.2 Nov 1, 2023

#242 in Text processing

Download history 573/week @ 2024-09-11 102/week @ 2024-09-18 102/week @ 2024-09-25 142/week @ 2024-10-02 109/week @ 2024-10-09 114/week @ 2024-10-16 68/week @ 2024-10-23 58/week @ 2024-10-30 70/week @ 2024-11-06 100/week @ 2024-11-13 57/week @ 2024-11-20 61/week @ 2024-11-27 133/week @ 2024-12-04 377/week @ 2024-12-11 89/week @ 2024-12-18 34/week @ 2024-12-25

641 downloads per month
Used in 6 crates (5 directly)

MIT/Apache

515MB
16M SLoC

C 15M SLoC // 0.0% comments Scheme 10K SLoC // 0.1% comments Rust 6.5K SLoC // 0.0% comments C++ 1K SLoC // 0.0% comments Shell 10 SLoC

Inkjet

A batteries-included syntax highlighting library for Rust, based on tree-sitter.

Features

  • Language grammars are linked into the executable as C functions - no need to load anything at runtime!
  • Pluggable formatters. Inkjet includes a formatter for HTML, and writing your own is easy.
  • Support for Helix editor themes, including a large collection of vendored themes to get you started.
  • Highlight into a new String or a std::io::Write/std::fmt::Write, depending on your use case.
  • Specify languages explicitly (from an enum) or look them up using a token like "rs" or "rust".
  • Extremely cursed build.rs

Included Languages

Inkjet comes bundled with support for over seventy languages, and it's easy to add more - see the FAQ section.

Click to expand...
Name Recognized Tokens
Ada ada
Assembly (generic) asm
Awk awk
Bash bash, sh, shell
BibTeX bibtex, bib
Bicep bicep
Blueprint blueprint, blp
C c, h
Cap'N Proto capnp
Clojure clojure, clj, cljc
C# c_sharp, c#, csharp, cs
C++ c++, cpp, hpp, h++, cc, hh
CSS css
Cue cue
D d, dlang
Dart dart
Diff diff
Dockerfile dockerfile, docker
EEx eex
Emacs Lisp elisp, emacs-lisp, el
Elixir ex, exs, leex
Elm elm
Erlang erl, hrl, es, escript
Forth forth, fth
Fortran fortran, for
Fish fish
GDScript gdscript, gd
Gleam gleam
GLSL glsl
Go go, golang
Haskell haskell, hs
HCL hcl, terraform
HEEx heex
HTML html, htm
INI ini
JavaScript javascript, js
JSON json
JSX jsx
Julia julia, jl
Kotlin kotlin, kt, kts
LaTeX latex, tex
LLVM llvm
Lua lua
GNU Make make, makefile, mk
MatLab matlab, m
Meson meson
Nix nix
Objective C objective_c, objc
OCaml ocaml, ml
OCaml Interface ocaml_interface, mli
OpenSCAD openscad, scad
Pascal pascal
PHP php
ProtoBuf protobuf, proto
Python python, py
R r
Racket racket, rkt
Regex regex
Ruby ruby, rb
Rust rust, rs
Scala scala
Scheme scheme, scm, ss
SCSS scss
SQL (Generic) sql
Swift swift
TOML toml
TypeScript typescript, ts
TSX tsx
Vimscript vimscript, vim
WAST (WebAssembly Script) wast
WAT (WebAssembly Text) wat, wasm
x86 Assembly x86asm, x86
WGSL wgsl
YAML yaml
Zig zig

In addition to these languages, Inkjet also offers the Runtime and Plaintext languages.

  • Runtime wraps a fn() -> &'static HighlightConfiguration pointer, which is used to resolve the language at (you guessed it) runtime.
  • Plaintext enables cheap no-op highlighting. It loads the diff grammar under the hood, but provides no highlighting queries. It's aliased to none and nolang.

Cargo Features

  • (Default) html - enables the bundled HTML formatter, which depends on v_htmlescape.
  • (Default) theme - enables the theme API, which depends on ahash, toml and serde.
  • (Default) all-languages - enables all languages.
  • language-{name} - enables the specified language.
    • If you want to only enable a subset of the included languages, you'll have to set default-features=false and manually re-add each language you want to use.
  • terminal - enables the termcolor-based terminal formatter, which depends on the theme feature.

FAQ

"Why is Inkjet so large?"

Parser sources generated by tree-sitter can grow quite big, with some being dozens of megabytes in size. Inkjet has to bundle these sources for all the languages it supports, so it adds up. (According to loc, there are over 23 million lines of C code!)

If you need to minimize your binary size, consider disabling languages that you don't need. Link-time optimization can also shave off a few megabytes.

"Why is Inkjet taking so long to build?"

Because it has to compile and link in dozens of C/C++ programs (the parsers and scanners for every language Inkjet bundles.)

However, after the first build, these artifacts will be cached and subsequent builds should be much faster.

"Why does highlighting require a mutable reference to the highlighter?

Under the hood, Inkjet creates a tree-sitter highlighter/parser object, which in turn dynamically allocates a chunk of working memory. Using the same highlighter for multiple simultaneous jobs would therefore cause all sorts of nasty UB.

If you want to highlight in parallel, you'll have to create a clone of the highlighter for each thread. I recommend thread_local! and RefCell if you need a quick and easy solution.

"A language I want to highlight isn't bundled with Inkjet!"

Assuming that you or someone else has implemented a highlighting-ready tree-sitter grammar for the language you want, adding it to Inkjet is easy! Just open an issue asking for it to be added, linking to the grammar repository for the language.

Alternatively, you can use Language::Runtime, which will allow you to use grammars not bundled with Inkjet.

Other notes:

  • Inkjet currently only supports grammar repositories that check in the parser generated by tree-sitter (in order to avoid a build-time dependency on node/npm.)
  • Inkjet requires that the grammar include (at minimum) a highlights.scm query targeted at the base tree-sitter library. Extended queries (such as those from nvim-treesitter) will not work.
  • I will not support blockchain/smart contract languages like Solidity. Please take your scam enablers elsewhere.

Building

For normal use, Inkjet will compile automatically just like any other crate.

However, if you have forked the repository and want to update the bundled languages, you'll need to use GNU Make with the included Makefile:

  • make redownload will wipe the languages/ directory and redownload everything from scratch.
    • Currently, this only works on *nix. You will need git, sed and wget installed. (Git clones the grammar repositories, while sed and wget are used in miniature setup scripts for some languages.)
  • make regenerate will wipe src/languages.rs and regenerate it from scratch.
  • make features will generate a file called features in the crate root, containing all the individual language features (ready to be pasted into Cargo.toml.)
  • make themes will regenerate the mod.rs file in src/theme/vendored using the contents of the data/ directory.

If, for whatever reason, you don't have GNU Make available: you can also perform these actions manually by setting the appropriate environment variables and Cargo flags:

  • INKJET_REDOWNLOAD_LANGS=true for make redownload.
  • INKJET_REBUILD_LANGS_MODULE=true for make regenerate.
  • INKJET_REBUILD_FEATURES=true for make features.
  • INKJET_REBUILD_THEMES=true for make themes. Run cargo build --all-features with these set. (The development portions of the build script are feature gated by default.)

Acknowledgements

  • Inkjet would not be possible without tree-sitter and the ecosystem of grammars surrounding it.
  • Many languages are only supported thanks to the highlighting queries created by the Helix project.

Dependencies

~3–12MB
~139K SLoC