#character #string #parser

sap-lexer

The lexer crate for the SAP programming language

2 stable releases

1.0.1 Mar 19, 2024
1.0.0 Mar 5, 2024

#1382 in Text processing

40 downloads per month
Used in 4 crates

MIT/Apache

43KB
888 lines

Lexer module

The lexer module is responsible for tokenising input strings. The lexer supports various token types such as identifiers, numbers, strings, and operators. The lexer uses a cursor-based approach to iterate over the input string and extract tokens.

The lexer is implemented as a struct called Lexer, which provides methods for tokenising input strings into individual tokens. The Lexer struct contains an iterator over the characters of the input string, and uses this iterator to extract tokens from the input.

The Lexer struct provides a method called next_token, which advances the lexer to the next token in the input stream and returns the token. This method is essentially a large switch statement, containing branches corresponding to every token type. The next_token method skips any whitespace and comments before identifying the next token.

The token is represented by a Token struct, which contains information about its kind (e.g., identifier, operator, literal) and its span in the input stream.

The lexer module is used by the parser to tokenise the input string before parsing it into an abstract syntax tree (AST).

Dependencies

~0.3–1MB
~23K SLoC