1 stable release
Uses new Rust 2024
new 1.0.0 | Apr 26, 2025 |
---|
#8 in #multiplexer
46 downloads per month
Used in lex_sleuther
81KB
2K
SLoC
lex_sleuther_multiplexer
A multiplexer over multiple lexers.
This crate is only responsible for actually lexing source code and producing a feature vector, a glorified count of each token occurrence.
We leverage the lexgen
lexer library to build optimized lexer state machines quickly. In many cases, the resulting tokenization is not totally correct, but it is good enough and fast enough to provide a meaningful guess.
adding new lexers
There are three general steps to adding new lexers:
- Create a new module using the
lexgen::lexer!
macro to generate a lexer based on a PEG-like syntax. Use existing lexers as a guide. - Add your lexer to the array
lexers
insrc/lib.rs
. - Create a simple test to sanity check the stability of your lexer. Use existing tests as an example.
Note that adding a lexer here does not add a new classification set to lex_sleuther
upstream.
The set of classification categories is completely decoupled from the set of lexers this library uses internally.
Dependencies
~2.5MB
~42K SLoC