#lexer #multiplexer #lex-sleuther-multiplexer

lex_sleuther_multiplexer

a multiplexer over multiple lexers

1 stable release

Uses new Rust 2024

new 1.0.0 Apr 26, 2025

#8 in #multiplexer

46 downloads per month
Used in lex_sleuther

MIT license

81KB
2K SLoC

lex_sleuther_multiplexer

A multiplexer over multiple lexers.

This crate is only responsible for actually lexing source code and producing a feature vector, a glorified count of each token occurrence.

We leverage the lexgen lexer library to build optimized lexer state machines quickly. In many cases, the resulting tokenization is not totally correct, but it is good enough and fast enough to provide a meaningful guess.

adding new lexers

There are three general steps to adding new lexers:

  1. Create a new module using the lexgen::lexer! macro to generate a lexer based on a PEG-like syntax. Use existing lexers as a guide.
  2. Add your lexer to the array lexers in src/lib.rs.
  3. Create a simple test to sanity check the stability of your lexer. Use existing tests as an example.

Note that adding a lexer here does not add a new classification set to lex_sleuther upstream. The set of classification categories is completely decoupled from the set of lexers this library uses internally.

Dependencies

~2.5MB
~42K SLoC