50 releases (28 breaking)

new 0.41.0 Dec 13, 2024
0.39.0 Dec 4, 2024
0.38.0 Nov 26, 2024
0.22.1 Jul 28, 2024

#1296 in Programming languages

Download history 977/week @ 2024-08-23 1351/week @ 2024-08-30 1773/week @ 2024-09-06 1681/week @ 2024-09-13 3044/week @ 2024-09-20 2119/week @ 2024-09-27 1060/week @ 2024-10-04 1105/week @ 2024-10-11 1422/week @ 2024-10-18 991/week @ 2024-10-25 1317/week @ 2024-11-01 1022/week @ 2024-11-08 1194/week @ 2024-11-15 1333/week @ 2024-11-22 1033/week @ 2024-11-29 1195/week @ 2024-12-06

4,890 downloads per month
Used in 3 crates (2 directly)

MIT license

3.5MB
79K SLoC

AST traversal with ability to read up the tree from visitors.

Please see traverse_mut for an explanation of API.

Implementation details

Most of the code in this crate is generated by a codegen. Codegen is currently written in JavaScript (scripts/build.mjs).

Do not edit those files, as they'll be over-written by the build script on next run.

The scheme which allows reading up the tree is based on making it statically impossible to violate Rust's aliasing rules.

Rust's aliasing rules are (roughly):

  1. For any object, you can have as many immutable & references simultaneously as you like.
  2. For any object, you cannot obtain a mutable &mut reference if any other references (immutable or mutable) to that same object exist.
  3. A &/&mut ref covers the object itself and the entire tree below it. i.e. you can't hold a &mut to child and any reference to parent at same time (except by "re-borrowing").

This poses a problem for reading back up the tree in a mutating visitor. In a visitor you hold a &mut reference to a node, so therefore cannot obtain a & ref to its parent, because the parent owns the node. If you had a & ref to the parent, that also acts as a & ref to the current node = holding & and &mut refs to same node simultaneously. Disaster!

The solution this crate uses is:

  1. Don't create references while traversing down the AST in walk_* functions. Use raw pointers instead. &mut references are only created in the enter_* / exit_* methods.
  2. Don't allow enter_* / exit_* to access its entire parent or ancestor, only other branches of the tree which lead from the parent/ancestor. The parts of the tree above current node which can be accessed via ctx.parent() etc do not overlap the part of the tree which is available via &mut ref inside enter_* / exit_*. This makes it impossible to obtain 2 references to same AST node at same time.
  3. Again, don't create transient references while getting the fields of ancestors. Use raw pointers to go to the field directly, before creating a & ref.

The mechanism for this is the Ancestor enum. Each AST node type has several Ancestor variants e.g. ProgramWithoutDirectives, ProgramWithoutHashbang, ProgramWithoutBody. As the names suggest, each of these types grants access to all the fields of Program except one.

As walk_* functions walk down the tree, they add to the stack of Ancestors the appropriate type to prevent access to the path which is being walked down.

walk_* uses TraverseCtx::retag_stack to make it as cheap as possible to update the ancestry stack, but this is purely a performance optimization, not essential to the safety of the scheme.

SAFETY

This crate contains a great deal of unsafe code. The entirety of walk.rs is unsafe functions using raw pointers.

To avoid a drain on compile time asking Rust to parse 1000s of comments, the codegen-ed files do not contain comments explaining the safety of every unsafe operation. But everything works according to the principles outlined above.

Almost all the code is currently codegen-ed. I (@overlookmotel) would recommend continuing to exclusively use a codegen, and not manually editing these files for "special cases". The safety scheme could very easily be derailed entirely by a single mistake, so in my opinion, it's unwise to edit by hand.

Dependencies

~8.5MB
~141K SLoC