#tree-sitter #proc-macro #internally #query #tree-sitter-grep

macro tree_sitter_grep_proc_macros

(proc-macros used internally by tree-sitter-grep)

2 releases

0.1.0 Jul 14, 2023
0.1.0-dev.0 Jul 12, 2023

#1320 in Procedural macros


Used in 3 crates (2 directly)

Unlicense OR MIT

13KB
332 lines

tree-sitter-grep

tree-sitter-grep is a grep-like search tool that recursively searches the current directory for a tree-sitter query pattern.

Build status Crates.io

Dual-licensed under MIT or the UNLICENSE.

Installation

With a Rust toolchain installed, run:

$ cargo install tree-sitter-grep

Usage

$ tree-sitter-grep -q '(trait_bounds) @t'
src/core.rs:14:pub struct Core<'s, M: 's, S> {
src/core.rs:30:impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
src/mod.rs:622:        P: AsRef<Path>,
src/mod.rs:623:        M: Matcher,
src/mod.rs:624:        S: Sink,
src/mod.rs:644:        M: Matcher,
[...]

Specifying the query

tree-sitter-grep uses tree-sitter queries to specify "patterns" to match

You can either specify the query "inline" with the -q/--query argument:

$ tree-sitter-grep -q '(trait_bounds) @t'

or via a path to a tree-sitter query file (typically *.scm) with the -Q/--query-file argument:

$ cat queries/trait_bounds.scm
(trait_bounds) @t
$ tree-sitter-grep -Q queries/trait_bounds.scm

tree-sitter-grep uses tree-sitter query "captures" (@whatever) to specify "matching" tree-sitter AST nodes

So your query must always include at least one capture

If your query includes multiple captures (eg if you are using a "pre-composed" query or are using a predicate), tree-sitter-grep will by default use the first capture in the query (in lexicographical order, I think?) as its "target capture"

To override that behavior, you can pass the -c/--capture argument:

$ tree-sitter-grep -q '((field_declaration name: (field_identifier) @field_name (#eq? @field_name "pos")) @f)' --capture f
How do I figure out what query I want?

It's worth reading the tree-sitter query docs as a starting point

Then for figuring out what the relevant tree-sitter AST structure is for a query you'd like to write, a tree-sitter "playground" is invaluable, eg the interactive online one or I use neovim's :InspectTree

In my experience while tree-sitter queries are a solid starting point, they aren't always "expressive" enough to be able to specify exactly the set of AST nodes you'd like to match

So that's why we also support specifying filter plugins where you have "total programmatic control" over what constitutes a match or not

Supported query "predicates"

Tree-sitter query predicates allow doing some eg "filtering" of matching tree-sitter AST nodes

We use the Rust tree-sitter bindings so "we support whatever predicates they do"

Specifically that includes:

  • #eq?
$ tree-sitter-grep -q '((field_declaration name: (field_identifier) @field_name (#eq? @field_name "pos")) @f)' --capture f
src/core.rs:20:    pos: usize,
  • #match?
$ tree-sitter-grep -q '((field_declaration name: (field_identifier) @field_name (#match? @field_name "^p")) @f)' --capture f
src/core.rs:20:    pos: usize,
src/mod.rs:157:    passthru: bool,
Filter plugins

When you need "the power of a programming language" in order to fully specify the matching "criteria", you can write a "filter plugin"

Using a filter plugin

If you have an existing filter plugin, you specify that you want to use it via the -f/--filter argument (with a path to the compiled filter dynamic library .so/.dll/.dylib file):

$ tree-sitter-grep -q '(trait_bounds) @t' -f path/to/libmy-filter.so

If the filter plugin expects to be passed a "filter argument" (eg for parameterizing/configuring its behavior in some way) then you specify that with the -a/--filter-arg argument:

$ tree-sitter-grep -q '(trait_bounds) @t' -f path/to/libmy-filter-that-expects-argument.so -a '{ the_filter_plugin_can_parse_this: "however_it_wants" }'

It's also worth noting that technically you don't have to pass a tree-sitter query argument at all if you supply a filter plugin argument (in which case the filter plugin will get invoked against "every" tree-sitter AST node)

Writing filter plugins

TODO: add a "guide" for this

The short version is:

While in theory you could probably write filter plugins in other languages the "happy path" would be to write them in Rust and use the example filter plugins from examples/ as a starting point/reference

The basic idea is that for each tree-sitter AST node that is a potential match according to the supplied query argument, the filter plugin then additionally gets invoked and indicates whether it considers that node a match or not (basically as a (&tree_sitter::Node) -> bool "predicate")

Supported target languages

Currently, tree-sitter-grep "bakes in" support for searching the following languages:

  • C
  • C++
  • C#
  • CSS
  • Dockerfile
  • Elisp
  • Elm
  • Go
  • HTML
  • Java
  • JavaScript
  • JSON
  • Kotlin
  • Lua
  • Objective-C
  • Python
  • Ruby
  • Rust
  • Swift
  • Toml
  • tree-sitter queries (how meta!)
  • TypeScript

In theory, any language that has a tree-sitter grammar crate published/available should be "fair game". In the future we may support dynamically specifying/loading additional languages

Or feel free to file an issue requesting "baked-in" support for other languages

Restricting the query to specific files/languages

By default, tree-sitter-grep will recursively search all "non-ignored/hidden" files of the supported languages/types and if it can parse the provided query against that language's grammar it will then search that file's contents for matches

To explicitly specify/restrict to a single language, use the -l/--language argument:

$ tree-sitter-grep -q '(trait_bounds) @t' -l rust

You can also restrict the search to certain files/directories by providing path arguments:

$ tree-sitter-grep -q '(trait_bounds) @t' src/main.rs src/compiler

Additional flags/arguments

For documentation of additional arguments related to eg customizing the match output, run:

$ tree-sitter-grep --help

In general, we are aiming to be rather ripgrep-"compatible"

Performance

I haven't done any "real" benchmarking but the general take seems to be that tree-sitter-grep is pleasantly, surprisingly fast (especially given that tree-sitter is not optimized for the "parse-from-scratch" use case)

For "not gigantic" code-bases I'm tending to see it run in < 100ms

And for "gigantic" code-bases where it's eg scanning > 300k lines of code and outputting > 7000 matches, I'm seeing it run in say 360ms, which still feels "quite fast"

Editor integrations

TODO, I believe that @peterstuart has written an initial version of an Emacs plugin and I started tinkering with writing a neovim plugin

The basic idea would probably tend to be that you'd be able to interact with matches from tree-sitter-grep in your editor the way that you'd interact with matches from eg grep/ripgrep

Contributions welcome/let us know if you've written a plugin for your editor of choice

Non-goals

  • Trying to support "everything and the kitchen sink" functionality (yes that is some slight ast-grep shade)

    We think tree-sitter-grep certainly has the potential to be a useful grep-like tool in and of itself, and beyond that we're thinking of it as a "building-block" that could in theory be leveraged by other tooling for eg search-and-replace, code-mod, ...

    I've already had success using tree-sitter-grep as part of a "one-off large-scale automated refactor"

  • Coming up with our own custom eg querying syntax (damn it's shady over here in the shade)

    I actually think the approach taken by eg ast-grep of providing a query syntax that "looks like the code" is pretty intuitive and maybe the "easiest thing to reach for" in a lot of cases

    I just personally am not drawn to it as an approach to tooling. I dislike that it's concealing the "tree-sitter-ness of it all". It feels like tree-sitter in general is very ripe for building a variety of different types of tooling on top of as an underlying technology and so I'm more drawn to "building blocks" that let you leverage existing knowledge/expertise and by their nature lead you down a path of gaining more of that knowledge/expertise. And maybe then building your own sh** on top of it (or inspired by it)

Contributing/issues

The code-base is a rather typical cargo-based Rust project

So eg cargo test runs the test suite

Feel free to open issues or pull requests

Dependencies

~0.5–1MB
~21K SLoC