1 unstable release
Uses new Rust 2024
new 0.1.0 | Apr 20, 2025 |
---|
#2738 in Parser implementations
20KB
351 lines
Tree-sitter typed AST generator
This crate provides a build script function for generating a typed AST from a tree-sitter grammar. Inspired by rowan.
Setup
In your project's Cargo.toml
:
[dependencies]
tree-sitter-your-lang = { path = "/path/to/tree-sitter-your-lang" }
ts-typed-ast = "0.1"
[build-dependencies]
tree-sitter-your-lang = { path = "/path/to/tree-sitter-your-lang" }
ts-typed-ast = "0.1"
In your project's build script (located at build.rs
):
fn main() {
// Generate typed AST from the tree-sitter grammar
let out_dir = std::path::PathBuf::from(std::env::var("OUT_DIR").unwrap());
ts_typed_ast::generate(
&tree_sitter_your_lang::LANGUAGE.into(),
tree_sitter_your_lang::NODE_TYPES,
std::fs::File::create(out_dir.join("ast.rs")).unwrap(),
);
}
Finally, add a module to your project's source (e.g., in lib.rs
) that includes the generated typed AST:
mod ast {
include!(concat!(env!("OUT_DIR"), "/ast.rs"));
}
Note that if you're actively working on the tree-sitter grammar, you'll need to re-run tree-sitter generate
each time you edit grammar.js
, otherwise the parser and generated AST will be out-of-date with the grammar.
The generated AST
For each named rule in the grammar, a Rust struct
named with this rule is generated. For example, the following rule:
{
function_application: $ => seq(
field('function', $.identifier),
'(',
terminated(',', field('arguments', $._expression)),
')',
),
}
produces the Rust code:
#[derive(Debug, Clone, Copy)]
pub struct FunctionApplication<'tree> { /* ... */ }
impl<'tree> ::ts_typed_ast::AstNode<'tree> for FunctionApplication<'tree> {
fn can_cast(kind: u16) -> bool { /* ... */ }
fn cast(node: ::tree_sitter::Node<'tree>) -> Option<Self> { /* ... */ }
fn node(&self) -> ::tree_sitter::Node<'tree> { /* ... */ }
}
impl<'tree> FunctionApplication<'tree> {
pub fn function(&self) -> Result<Identifier<'tree>, ::ts_typed_ast::MissingNodeChildError<'tree>> { /* ... */ }
pub fn arguments(&self) -> impl Iterator<Item = Expression<'tree>> { /* ... */ }
}
The methods cast
and node
of the AstNode
trait can be used to go back-and-forth between tree_sitter::Node
and the typed AST. Then, specific methods function
and arguments
grant access to specific fields of the rule. Note that such functions are only generated when field('func_name', $.item)
is used.
The grammar's supertypes each generate a Rust enum
. For example, the following:
{
supertypes: $ => [$._expression],
rules: {
_expression: $ => choice(
$.identifier,
$.lambda,
$.function_application,
),
},
}
will generate:
#[derive(Debug, Clone, Copy)]
pub enum Expression<'tree> {
FunctionApplication(FunctionApplication<'tree>),
Lambda(Lambda<'tree>),
Identifier(Identifier<'tree>),
}
impl<'tree> ::ts_typed_ast::AstNode<'tree> for Expression<'tree> {
fn can_cast(kind: u16) -> bool { /* ... */ }
fn cast(node: ::tree_sitter::Node<'tree>) -> Option<Self> { /* ... */ }
fn node(&self) -> ::tree_sitter::Node<'tree> { /* ... */ }
}
Enums will also produce {node}Visitor
traits with methods to visit each possible node of the enum:
pub trait ExpressionVisitor<'tree> {
type Output;
fn expression(&mut self, expression: Expression<'tree>) {
match expression {
Expression::FunctionApplication(function_application) =>
self.function_application(function_application),
// etc.
}
}
fn function_application(&mut self, function_application: FunctionApplication<'tree>)
-> Self::Output;
// etc.
}
Finally, unnamed choice
s generate enum
s on the fly. For example:
{
neg: $ => seq(
'-',
field('value', choice(
$.integer,
$.float,
)),
),
}
will produce:
#[derive(Debug, Clone, Copy)]
pub struct Neg<'tree> { /* ... */ }
impl<'tree> ::ts_typed_ast::AstNode<'tree> for Neg<'tree> { /* ... */ }
impl<'tree> Neg<'tree> { /* ... */ }
#[derive(Debug, Clone, Copy)]
pub enum NegValue<'tree> {
Integer(Integer<'tree>),
Float(Float<'tree>),
}
impl<'tree> ::ts_typed_ast::AstNode<'tree> for NegValue<'tree> {
fn can_cast(kind: u16) -> bool { /* ... */ }
fn cast(node: ::tree_sitter::Node<'tree>) -> Option<Self> { /* ... */ }
fn node(&self) -> ::tree_sitter::Node<'tree> { /* ... */ }
}
Example
A complete example is found in examples/lambda
. The subfolder tree-sitter-lambda
declares a grammar for an ML-style lambda calculus with booleans, integers, abstraction, application, let-in, and conditionals. This grammar was created by calling tree-sitter init
, removing all non-Rust bindings, editing the grammar.js
and test/corpus/expr.txt
files, and then calling tree-sitter generate
. Note that you may of course keep the non-Rust bindings if you want them.
The eval
module contains a tree-walking interpreter for the language, and a few tests are in the root module of lambda
. In more complex projects, you will probably want to first convert the typed AST to an intermediate representation, before interpreting and/or compiling.
Why not a proc macro?
Because you can inspect the generated code more easily, either by manually opening the file ./target/debug/build/[crate-name]-[hash]/out/ast.rs
, or more practically by jumping to definitions in your LSP.
Dependencies
~3.5–6MB
~113K SLoC