#tree-sitter #source #tool

app code-shape

Code-shape is a tool for extracting definitions from source code files

3 unstable releases

0.2.1 Apr 22, 2023
0.2.0 Apr 22, 2023
0.1.0 Apr 21, 2023

#270 in #tree-sitter

MIT/Apache

17KB
309 lines

Code-shape

crates.io badge

Code-shape is a tool that uses Tree-sitter to extract a shape of code definitions from a source code file. The tool uses the same language parsers that are installed for Tree-sitter CLI.

Installation

To install the code-shape CLI it's possible to use Rust's Cargo package manager:

cargo install code-shape

Usage

To start using the tool it's needed to do some preparation.

Prerequsites

  1. Install Tree-sitter CLI.
  2. Run tree-sitter init-config that creates a config file like ~/.config/tree-sitter/config.json in Tree-sitter's config dir.
  3. Create a directory where installed parsers would be located and add it in "parser-directories" list in Tree-sitter's config file.
  4. Clone Tree-sitter parsers for required languages to the parsers directory.

Define extraction query

To make it possible to extract a shape of definitions from some source code file for some language, it's needed to define a query. To define a new query create a file in a Code-shape's languages config dir ~/.config/code-shape/languages/ with an .scm suffix like ~/.config/code-shape/languages/c.scm and put there a set of Tree-sitter query patterns like:

; C language function declarations
(declaration
    [
        (function_declarator
            declarator: (identifier) @fn.declaration.name
        )
        (_
            (function_declarator
                declarator: (identifier) @fn.declaration.name
            )
        )
    ]
)

; C language function pointer declarations
(declaration
    [
        (init_declarator
            (function_declarator
                (_ (_ declarator: (identifier) @fn.pointer.declaration.name))
            )
        )
        (init_declarator
            (_
                (function_declarator
                    (_ (_ declarator: (identifier) @fn.pointer.declaration.name))
                )
            )
        )
    ]
)

; C language function definitions
(function_definition
    [
        (function_declarator
            declarator: (identifier) @fn.name
        )
        (_
            (function_declarator
                declarator: (identifier) @fn.name
            )
        )
    ]
    body: (_) @fn.scope
)

It's needed to define captures with special names:

  • <type>.name is a capture where the type may be, e.g., fn, class or anything else to match a code entity name.
  • <type>.scope is a special capture that allows for the tool to capture a context of entities and usually are tokens that defines a body of the the entity, e.g., a function body.

Examples of the tool output:

# code-shape --scope source.c tree-sitter/lib/src/alloc.c
fn ts_malloc_default
fn ts_calloc_default
fn ts_realloc_default
fn.pointer.declaration ts_current_malloc
fn.pointer.declaration ts_current_calloc
fn.pointer.declaration ts_current_realloc
fn.pointer.declaration ts_current_free
fn ts_set_allocator

# code-shape examples/foo.c
fn.declaration ts_malloc_default
fn.declaration ts_calloc_default
fn.declaration ts_realloc_default
fn.pointer.declaration ts_current_malloc
fn.pointer.declaration ts_current_calloc
fn.pointer.declaration ts_current_realloc
fn.pointer.declaration ts_current_free
fn foo
fn bar

# code-shape examples/foo.py
class Foo
  def foo
  def bar
    def inner
def one
def two
def wrap
  class Baz
    class Bar
      class Foo
        def func1
          def func2
  def three
def four

Embedded shape queries

For now the tool has builtin shape queries for the following language parsers:

Dependencies

~10–21MB
~299K SLoC