#parser #datalog #incremental #souffle

tree-sitter-souffle

Soufflé grammar for the tree-sitter parsing library

2 unstable releases

0.4.0 Oct 24, 2022
0.3.0 Oct 21, 2022
0.1.0 Oct 21, 2022

#201 in Text editors

Download history 8/week @ 2023-12-11 2/week @ 2023-12-18 68/week @ 2024-01-01 2/week @ 2024-01-08 47/week @ 2024-01-29 8/week @ 2024-02-12 25/week @ 2024-02-19 82/week @ 2024-02-26 38/week @ 2024-03-04 24/week @ 2024-03-11 19/week @ 2024-03-18 26/week @ 2024-03-25

110 downloads per month
Used in 4 crates

MIT license

630KB
21K SLoC

C 20K SLoC JavaScript 455 SLoC // 0.3% comments Rust 31 SLoC // 0.1% comments

tree-sitter-souffle

crates.io

A tree-sitter grammar for Soufflé.

Contents

Table of Contents

Status

The grammar is fairly complete. It parses:

There are a few known issues.

Use-Cases

This grammar aims to support the following use-cases:

  • Editor integration
    • Code navigation (e.g., jump-to-definition, show references)
    • Syntax highlighting
  • Evaluation
  • Linting
  • Static analysis

It does not currently aim to support round-trip printing (i.e., code formatting and refactoring).

On the C Pre-Processor

This parser has limited support for parsing C pre-processor #line directives, which may be helpful for analysis tasks. To avoid over-complicating the grammar, it does not handle directives in the middle of top-level entities (e.g., in between two conjuncts of a rule). You can configure your preprocessor to not emit such tokens (-P for mcpp).

Design

The grammar doesn't mirror the structure of the Soufflé C++ parser implementation nor the grammar as presented in the Soufflé documentation. It instead tries to map nonterminals to more abstract categories, i.e., categories that match the way we think about the language. For instance, the Soufflé C++ parser doesn't have a nonterminal for constants, whereas this parser does. As another example, the Soufflé documentation has this specification for type declarations:

type_decl ::= '.type' IDENT ("<:" type_name | "=" ( type_name ( "|" type_name )* | record_list | adt_branch ( "|" adt_branch )* ))

This grammar instead has something more like

type_decl ::= subtype | type_synonym | type_union | record | adt

so that the parse tree encodes a bit more "semantic" information. The hope is that these choices make it easier to write tree-sitter queries.

For the same reason, the grammar doesn't specify keywords and operators like .functor, eqrel, *, and :- as their own nonterminals, but rather inlines them into the grammar.

Development

Tests

Run

tree-sitter test

or equivalently

npm test

This script downloads the Soufflé, ddisasm, cclyzer++, and other repos and attempts to parse their Datalog files.

./scripts/parse-examples

References

Releasing

  1. Update CHANGELOG.md
  2. Update the version number in Cargo.toml
  3. git checkout main && git pull origin && git tag -a vX.Y.Z -m vX.Y.Z && git push --tags
  4. cargo publish
  5. Release the pre-release created by CI

Dependencies

~2.7–4MB
~71K SLoC