1 unstable release
Uses old Rust 2015
0.1.0 | May 28, 2018 |
---|
#18 in #lex
47KB
808 lines
Luther derive
Luther is an embedded lexer generator for stable Rust.
This crate is the proc macro implementation for deriving the Lexer trait from the Luther crate. See the crate level documentation for the options recognized by the proc macro. See the Luther crate for an example of the usage of this crate.
License
Luther is licensed under either of
- Apache License, Version 2.0, (LICENSE-APACHE-2.0 or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in Luther by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
lib.rs
:
luther_derive
provides a procedural macro to derive the luther::Lexer
trait.
Deriving the luther::Lexer
trait is expected to be the primary (possibly only)
way of implementing this trait. The trait can be derived on an enum
of token
types where the variants of the enum
are annotated with a regular expression.
Not all variants of the enum
need to be annotated with a regular expression, but
variants that do not have such an annotation will not be returned by the lexer
that luther_derive
generates.
Generating the lexer adds a visible type name for the deterministic finite automaton
that the lexer uses internally. Once hygenic macros are available it will be possible
to hide this name, but with the current implementation of procedural macros the name
will be visible. By default the name is formed by adding a suffix of Dfa
to the
name of the enum
on which luther::Lexer
is derived. This default can be overridden
with the dfa
option of the luther
attribute.
Example
extern crate luther;
#[macro_use]
extern crate luther_derive;
#[derive(Lexer)]
enum Token {
#[luther(regex = "ab")]
Ab,
#[luther(regex = "acc*")]
Acc(String),
}
Capturing the recognized characters.
If a variant of the enum
on which the lexer is being geneated includes a single
type (like the Acc
variant in the above example) and that type implements
str::FromStr
(like String
does for the Acc
example) then the generated
lexer will capture the recognized characters when it has matched that variant's
regular expression. It will capture the characters as a value of the type using
the type's str::FromStr
implementation.
It is an error to have more than one type included in an enum
variant. luther_derive
will recognize this error. It is also an error to have a signle type that does not
implement str::FromStr
, but luther_derive
cannot recognize this error. This case
will likely manifest itself with a confusing error message from the compiler.
For now the single type included in an enum must also implement default::Default
,
although this restriction may be lifted in the future.
The code to capture the characters will be someting similar to
characters.parse().unwarp_or_default()
where characters
is a &str
of the
recognized characters.
The luther
attribute
luther_derive
recognized the luther
attribute both on the enum
for which
luther::Lexer
is being derived and on the variants of that enum
. luther
supports various options which are invoked like `#[luther(option = "value")].
The options supported by the luther
attribute are the following with an indication
of where the option is valid (the enum or the variants):
dfa
: the name to use for the generated deterministic finite automaton [enum]regex
: the regular expression to recognize for particular variant [variant]priority_group
: the priority group to which a variant belongs [variant]
Priority groups
It is possible for the regular expressions for more than one enum
variant to match
the same input. For example, the following regular expressions all match the input
"auto":
- "auto"
- "[a-z]+"
- "[a-z]+[0-9]*"
The lexer generated by luther_derive
will favour simple strings as the regex
option
on the luther
attribute over more complicated regular expressions. In the examples listed
above this means that item 1 will be prefered over either item 2 or 3. This rule allows the
lexer to prefer keywords over identifiers, for example.
If the preference for simple strings is not enough to resolve the ambiguity, though, then
you will have to use the priority_group
option of the luther
attribute to indicate which
of the two (or more) is a higher priority (a smaller number indicates a higher priority).
Within a priority group, though, luther_derive
will continue to favour simple strings over
other more complicated regular expressions.
The default value for priority_group
if it is not specified is 1.
Errors
luther_derive
will raise an error at compile time in the following circumstances (among
others):
- the
#[derive(Lexer)]
invocation is on astruct
rather than anenum
- none of the variants of the
enum
have aluther
attribute with theregex
specified - one of the
regex
's specified for a variant would match the empty string - a variant has included types that are not a tuple of arity 1
- the value provided for the
regex
option can't be parsed as a regular expression - the value provided for the
priority_group
option can't be parsed as an integer
Dependencies
~2.5MB
~62K SLoC