#lexer #tokenizer #token

alkale

A simple LL(1) lexer library for Rust

5 stable releases

2.0.0 Nov 18, 2024
1.0.3 Oct 14, 2024
1.0.2 Sep 20, 2024
1.0.1 Sep 13, 2024

#63 in Parser tooling

MIT/Apache

155KB
2K SLoC

Alkale

This is the repository for Alkale, a Rust library to assist in making hand-written LL(1) lexers.

Goals

Alkale has three specific goals in mind for its design.

Goal 1: Base Layer

Alkale should act as a "base layer" for a larger compiler. It implements datatypes such as Spans to track where tokens were in the source code. It also implements Notifications, which act as a built-in lint and error system.

Goal 2: Be efficient

Alkale does not generate lexers for you, it just provides an API to streamline the production of one. For this reason, it should be fairly efficient and able to be tweaked.

Goal 3: Include Many Built-Ins

Many aspects of lexers are extremely common and repetitive. Think things such as skipping whitespace, generating string or number tokens, recovering from errors, etc. These common elements should come pre-packaged with Alkale by default.

Structure

The core of Alkale is built on two types: SourceCodeScanner and LexerResult. The former provides a way to interface with source code in an organized manner, while the latter acts as an accumulator of Notifications (errors) and Tokens (Actual output)

See the documentation of these types for more information.

Performance

A well-formed Alkale lexer can operate at about 208MB/s on my fairly modest machine. (Derived from the simplex and deflate benchmark's results) This is far faster than a simple regex-based lexer, but is still beaten by some other lexers.

Dependencies