30 releases

0.20.0 Sep 3, 2021
0.19.5 May 20, 2021
0.19.3 Mar 12, 2021
0.17.1 Nov 3, 2020
0.3.1 Jul 20, 2018

#4 in Parser tooling

Download history 4010/week @ 2021-06-01 3172/week @ 2021-06-08 4223/week @ 2021-06-15 4154/week @ 2021-06-22 3524/week @ 2021-06-29 3768/week @ 2021-07-06 3105/week @ 2021-07-13 4573/week @ 2021-07-20 3814/week @ 2021-07-27 4760/week @ 2021-08-03 5174/week @ 2021-08-10 4466/week @ 2021-08-17 4168/week @ 2021-08-24 4542/week @ 2021-08-31 8150/week @ 2021-09-07 16040/week @ 2021-09-14

17,570 downloads per month
Used in 77 crates (62 directly)

MIT license

440KB
11K SLoC

C 8K SLoC // 0.0% comments Rust 2.5K SLoC // 0.0% comments

Rust Tree-sitter

Build Status Build status Crates.io

Rust bindings to the Tree-sitter parsing library.

Basic Usage

First, create a parser:

use tree_sitter::{Parser, Language};

let mut parser = Parser::new();

Tree-sitter languages consist of generated C code. To make sure they're properly compiled and linked, you can create a build script like the following (assuming tree-sitter-javascript is in your root directory):

use std::path::PathBuf;

fn main() {
    let dir: PathBuf = ["tree-sitter-javascript", "src"].iter().collect();

    cc::Build::new()
        .include(&dir)
        .file(dir.join("parser.c"))
        .file(dir.join("scanner.c"))
        .compile("tree-sitter-javascript");
}

Add the cc crate to your Cargo.toml under [build-dependencies]:

[build-dependencies]
cc="*"

To then use languages from rust, you must declare them as extern "C" functions and invoke them with unsafe. Then you can assign them to the parser.

extern "C" { fn tree_sitter_c() -> Language; }
extern "C" { fn tree_sitter_rust() -> Language; }
extern "C" { fn tree_sitter_javascript() -> Language; }

let language = unsafe { tree_sitter_rust() };
parser.set_language(language).unwrap();

Now you can parse source code:

let source_code = "fn test() {}";
let tree = parser.parse(source_code, None).unwrap();
let root_node = tree.root_node();

assert_eq!(root_node.kind(), "source_file");
assert_eq!(root_node.start_position().column, 0);
assert_eq!(root_node.end_position().column, 12);

Editing

Once you have a syntax tree, you can update it when your source code changes. Passing in the previous edited tree makes parse run much more quickly:

let new_source_code = "fn test(a: u32) {}"

tree.edit(InputEdit {
  start_byte: 8,
  old_end_byte: 8,
  new_end_byte: 14,
  start_position: Point::new(0, 8),
  old_end_position: Point::new(0, 8),
  new_end_position: Point::new(0, 14),
});

let new_tree = parser.parse(new_source_code, Some(&tree));

Text Input

The source code to parse can be provided either as a string, a slice, a vector, or as a function that returns a slice. The text can be encoded as either UTF8 or UTF16:

// Store some source code in an array of lines.
let lines = &[
    "pub fn foo() {",
    "  1",
    "}",
];

// Parse the source code using a custom callback. The callback is called
// with both a byte offset and a row/column offset.
let tree = parser.parse_with(&mut |_byte: u32, position: Point| -> &[u8] {
    let row = position.row as usize;
    let column = position.column as usize;
    if row < lines.len() {
        if column < lines[row].as_bytes().len() {
            &lines[row].as_bytes()[column..]
        } else {
            "\n".as_bytes()
        }
    } else {
        &[]
    }
}, None).unwrap();

assert_eq!(
  tree.root_node().to_sexp(),
  "(source_file (function_item (visibility_modifier) (identifier) (parameters) (block (number_literal))))"
);

Dependencies

~1–1.5MB
~42K SLoC