4 releases
0.1.3 | Nov 10, 2024 |
---|---|
0.1.2 | Nov 10, 2024 |
0.1.1 | Oct 28, 2024 |
0.1.0 | Oct 27, 2024 |
#1228 in Rust patterns
43 downloads per month
8KB
85 lines
Lachs
A tool to automatically generate a lexer based on a given enum.
Usage
To generate a lexer from a given struct, just annotate it with token
:
use lachs::token;
#[token]
pub enum Token {
#[terminal("+")]
Plus,
#[literal("[0-9]+")]
Integer
}
As you can see, we also annotated the variants Token::Plus
and Token::Integer
with #[terminal("+")]
and #[literal("[0-9]+")]
, respectively.
The helper #[terminal(...)]
takes a string literal which has to match exactly to be lexed as the decorated token, while #[literal(...)]
takes
a regular expression to extract a matched sequence from the text.
These helper macros get evaluated by #[token]
and describe the two different kinds of tokens the lexer can understand:
- terminals (without an own value)
- literals (with an own value)
Under the hood, the proc macro expands the struct to roughly the following:
pub enum Token {
Plus {
position: lachs::Span,
},
Integer {
value: String,
position: lachs::Span,
}
}
Both, terminals and literals have a field named position
to store the position in the originating text.
Literals have an additional field value
which stores the value which matched the passed regular expression.
Additionally, the Token
enum gets a function which lets you pass a string and get the result of the lexing back:
use lachs::token;
#[token]
pub enum Token {
#[terminal("+")]
Plus,
#[literal("[0-9]+")]
Integer
}
let result: Result<Vec<Token>, LexError> = Token::lex("2 + 2");
Caveats
The macro also generates an implementation of PartialEq
for the decorated enum. However, this implementation does not take the position into account.
If you want to check whether two tokens are exactly the same, you can utilize the Token::does_equal(...)
function.
Generated Stuff
The macro generates additional structs for performing the actual lexing. These should not be touched, if possible. However, they can lead to name collisions.
Dependencies
~2.5–10MB
~104K SLoC