55 releases (33 breaking)
new 0.46.0 | Jan 14, 2025 |
---|---|
0.44.0 | Dec 25, 2024 |
0.38.0 | Nov 26, 2024 |
0.22.1 | Jul 28, 2024 |
#1324 in Programming languages
4,008 downloads per month
Used in 3 crates
(2 directly)
3.5MB
80K
SLoC
AST traversal with ability to read up the tree from visitors.
Please see traverse_mut
for an explanation of API.
Implementation details
Most of the code in this crate is generated by a codegen.
Codegen is currently written in JavaScript (scripts/build.mjs
).
Do not edit those files, as they'll be over-written by the build script on next run.
The scheme which allows reading up the tree is based on making it statically impossible to violate Rust's aliasing rules.
Rust's aliasing rules are (roughly):
- For any object, you can have as many immutable
&
references simultaneously as you like. - For any object, you cannot obtain a mutable
&mut
reference if any other references (immutable or mutable) to that same object exist. - A
&
/&mut
ref covers the object itself and the entire tree below it. i.e. you can't hold a&mut
to child and any reference to parent at same time (except by "re-borrowing").
This poses a problem for reading back up the tree in a mutating visitor.
In a visitor you hold a &mut
reference to a node, so therefore cannot obtain a &
ref to
its parent, because the parent owns the node. If you had a &
ref to the parent, that also acts
as a &
ref to the current node = holding &
and &mut
refs to same node simultaneously.
Disaster!
The solution this crate uses is:
- Don't create references while traversing down the AST in
walk_*
functions. Use raw pointers instead.&mut
references are only created in theenter_*
/exit_*
methods. - Don't allow
enter_*
/exit_*
to access its entire parent or ancestor, only other branches of the tree which lead from the parent/ancestor. The parts of the tree above current node which can be accessed viactx.parent()
etc do not overlap the part of the tree which is available via&mut
ref insideenter_*
/exit_*
. This makes it impossible to obtain 2 references to same AST node at same time. - Again, don't create transient references while getting the fields of ancestors. Use raw pointers
to go to the field directly, before creating a
&
ref.
The mechanism for this is the Ancestor
enum. Each AST node type has several Ancestor
variants
e.g. ProgramWithoutDirectives
, ProgramWithoutHashbang
, ProgramWithoutBody
.
As the names suggest, each of these types grants access to all the fields of Program
except one.
As walk_*
functions walk down the tree, they add to the stack of Ancestors
the appropriate type
to prevent access to the path which is being walked down.
walk_*
uses TraverseCtx::retag_stack
to make it as cheap as possible to update the ancestry
stack, but this is purely a performance optimization, not essential to the safety of the scheme.
SAFETY
This crate contains a great deal of unsafe code. The entirety of walk.rs
is unsafe functions
using raw pointers.
To avoid a drain on compile time asking Rust to parse 1000s of comments, the codegen-ed
files do not contain comments explaining the safety of every unsafe operation.
But everything works according to the principles outlined above.
Almost all the code is currently codegen-ed. I (@overlookmotel) would recommend continuing to exclusively use a codegen, and not manually editing these files for "special cases". The safety scheme could very easily be derailed entirely by a single mistake, so in my opinion, it's unwise to edit by hand.
Dependencies
~8.5MB
~146K SLoC