#lexer #tokenizer #tokeniser

nightly macro lexerus_derive

Simple annotated lexer

1 unstable release

0.1.0 Sep 11, 2024

#148 in #tokenizer


Used in lexerus

Custom license

49KB
1.5K SLoC

Note: This readme is auto generated. Please refer to the docs.

Lexerus

Lexerus is a lexer dinosaur that consumes a [Buffer] constructed from [str] and spits out a structure through the lexer::Lexer::lex call.

This library uses the lexer_derive::Token and lexer_derive::Lexer macros to decorate a structure for automatic parsing. See those macros for additional options.

This library was developed in conjunction with SPEW and examples on actual implementation can be found there.

Example


// Create and decorate a struct
#[derive(Lexer, Token, Debug)]
struct Trex<'code>(#[pattern = "trex::"] Buffer<'code>);

#[derive(Lexer, Token, Debug)]
struct TrexCall<'code>(
    #[pattern = "RAWR"] Buffer<'code>,
);

#[derive(Lexer, Token, Debug)]
struct Call<'code> {
    rex: Trex<'code>,
    call: TrexCall<'code>,
}

// Create a raw buffe
let mut buffer = Buffer::from("trex::RAWR");

// Attempt to parse the trex
let trex_calling = Call::lex(&mut buffer).unwrap();

// Extract the buffer from trex
let trex = trex_calling.rex.buffer().unwrap();
let trex_calling = trex_calling.buffer().unwrap();

// Buffer should contain the exact matched string
assert_eq!(trex_calling.to_string(), "trex::RAWR");
assert_eq!(trex.to_string(), "trex::");

Goals

  • No heap allocations when parsing. However there are some exceptions:
    • When using helpers such as [GroupUntil], a [Vec] is allocated to store the parsed [Buffer] in individual units. Contrast this with [Group] which only captures the [Buffer] output without individual segregation.
  • Heap allocations only occur when calling Token::buffer on non-contigous sections of text or repeated sections of text. This is inevitable beause different sections [str] have to be stitched together and teh only way to do so is with a heap allocation.
  • Proper debuggable information, i.e. the [Buffer] retains information about its source and the exact range on the source.

Dependencies

~250–700KB
~17K SLoC