1 unstable release

Uses new Rust 2024

new 0.1.0 Apr 20, 2025

#2738 in Parser implementations

MIT license

20KB
351 lines

Tree-sitter typed AST generator

This crate provides a build script function for generating a typed AST from a tree-sitter grammar. Inspired by rowan.

Setup

In your project's Cargo.toml:

[dependencies]
tree-sitter-your-lang = { path = "/path/to/tree-sitter-your-lang" }
ts-typed-ast = "0.1"

[build-dependencies]
tree-sitter-your-lang = { path = "/path/to/tree-sitter-your-lang" }
ts-typed-ast = "0.1"

In your project's build script (located at build.rs):

fn main() {
    // Generate typed AST from the tree-sitter grammar
    let out_dir = std::path::PathBuf::from(std::env::var("OUT_DIR").unwrap());
    ts_typed_ast::generate(
        &tree_sitter_your_lang::LANGUAGE.into(),
        tree_sitter_your_lang::NODE_TYPES,
        std::fs::File::create(out_dir.join("ast.rs")).unwrap(),
    );
}

Finally, add a module to your project's source (e.g., in lib.rs) that includes the generated typed AST:

mod ast {
    include!(concat!(env!("OUT_DIR"), "/ast.rs"));
}

Note that if you're actively working on the tree-sitter grammar, you'll need to re-run tree-sitter generate each time you edit grammar.js, otherwise the parser and generated AST will be out-of-date with the grammar.

The generated AST

For each named rule in the grammar, a Rust struct named with this rule is generated. For example, the following rule:

{
  function_application: $ => seq(
    field('function', $.identifier),
    '(',
    terminated(',', field('arguments', $._expression)),
    ')',
  ),
}

produces the Rust code:

#[derive(Debug, Clone, Copy)]
pub struct FunctionApplication<'tree> { /* ... */ }

impl<'tree> ::ts_typed_ast::AstNode<'tree> for FunctionApplication<'tree> {
    fn can_cast(kind: u16) -> bool { /* ... */ }
    fn cast(node: ::tree_sitter::Node<'tree>) -> Option<Self> { /* ... */ }
    fn node(&self) -> ::tree_sitter::Node<'tree> { /* ... */ }
}

impl<'tree> FunctionApplication<'tree> {
    pub fn function(&self) -> Result<Identifier<'tree>, ::ts_typed_ast::MissingNodeChildError<'tree>> { /* ... */ }
    pub fn arguments(&self) -> impl Iterator<Item = Expression<'tree>> { /* ... */ }
}

The methods cast and node of the AstNode trait can be used to go back-and-forth between tree_sitter::Node and the typed AST. Then, specific methods function and arguments grant access to specific fields of the rule. Note that such functions are only generated when field('func_name', $.item) is used.

The grammar's supertypes each generate a Rust enum. For example, the following:

{
  supertypes: $ => [$._expression],
  rules: {
    _expression: $ => choice(
      $.identifier,
      $.lambda,
      $.function_application,
    ),
  },
}

will generate:

#[derive(Debug, Clone, Copy)]
pub enum Expression<'tree> {
    FunctionApplication(FunctionApplication<'tree>),
    Lambda(Lambda<'tree>),
    Identifier(Identifier<'tree>),
}

impl<'tree> ::ts_typed_ast::AstNode<'tree> for Expression<'tree> {
    fn can_cast(kind: u16) -> bool { /* ... */ }
    fn cast(node: ::tree_sitter::Node<'tree>) -> Option<Self> { /* ... */ }
    fn node(&self) -> ::tree_sitter::Node<'tree> { /* ... */ }
}

Enums will also produce {node}Visitor traits with methods to visit each possible node of the enum:

pub trait ExpressionVisitor<'tree> {
    type Output;

    fn expression(&mut self, expression: Expression<'tree>) {
        match expression {
            Expression::FunctionApplication(function_application) =>
                self.function_application(function_application),
            // etc.
        }
    }

    fn function_application(&mut self, function_application: FunctionApplication<'tree>)
        -> Self::Output;
    // etc.
}

Finally, unnamed choices generate enums on the fly. For example:

{
  neg: $ => seq(
    '-',
    field('value', choice(
      $.integer,
      $.float,
    )),
  ),
}

will produce:

#[derive(Debug, Clone, Copy)]
pub struct Neg<'tree> { /* ... */ }

impl<'tree> ::ts_typed_ast::AstNode<'tree> for Neg<'tree> { /* ... */ }

impl<'tree> Neg<'tree> { /* ... */ }

#[derive(Debug, Clone, Copy)]
pub enum NegValue<'tree> {
    Integer(Integer<'tree>),
    Float(Float<'tree>),
}

impl<'tree> ::ts_typed_ast::AstNode<'tree> for NegValue<'tree> {
    fn can_cast(kind: u16) -> bool { /* ... */ }
    fn cast(node: ::tree_sitter::Node<'tree>) -> Option<Self> { /* ... */ }
    fn node(&self) -> ::tree_sitter::Node<'tree> { /* ... */ }
}

Example

A complete example is found in examples/lambda. The subfolder tree-sitter-lambda declares a grammar for an ML-style lambda calculus with booleans, integers, abstraction, application, let-in, and conditionals. This grammar was created by calling tree-sitter init, removing all non-Rust bindings, editing the grammar.js and test/corpus/expr.txt files, and then calling tree-sitter generate. Note that you may of course keep the non-Rust bindings if you want them.

The eval module contains a tree-walking interpreter for the language, and a few tests are in the root module of lambda. In more complex projects, you will probably want to first convert the typed AST to an intermediate representation, before interpreting and/or compiling.

Why not a proc macro?

Because you can inspect the generated code more easily, either by manually opening the file ./target/debug/build/[crate-name]-[hash]/out/ast.rs, or more practically by jumping to definitions in your LSP.

Dependencies

~3.5–6MB
~113K SLoC