#line-column #tokenizer #parser #tracking #backtracking #tiny #no-alloc

no-std simple-tokenizer

A tiny no_std tokenizer with line & column tracking

5 unstable releases

0.4.2 Jun 27, 2024
0.4.1 Feb 24, 2024
0.4.0 Feb 22, 2024
0.2.0 Dec 30, 2023
0.1.0 Dec 17, 2023

#100 in Parser tooling

Download history 18/week @ 2024-07-22 14/week @ 2024-09-16 20/week @ 2024-09-30 1/week @ 2024-10-07

259 downloads per month

MIT license

25KB
351 lines

simple-tokenizer

A tiny no_std tokenizer with line & column tracking.

Goals:

  • no_std, no allocations and zero/minimal dependencies.
  • Be simple to use.
  • Line/column tracking.
  • Allow jumping to an arbitrary position in the source.
  • Backtracking by default, i.e. if a function fails, it won't consume any input.

This isn't a parser combinator library and it won't ever become one. There are plenty of other choices already.

Example

use simple_tokenizer::*;

// A small parser that would parse function calls with number/ident arguments.
let source = r"function(123, other_argument, 456)";

let mut tokens = source.as_tokens(); // AsTokens is implemented for any AsRef<str>

let fn_name = tokens.take_while(|ch| ch.is_ascii_alphabetic() || ch == '_').to_string();

// Empty => there was nothing matching a function name
if fn_name.is_empty() {
    // use better error handling youself
    panic!("error at {}", tokens.position());
}

tokens.take_while(char::is_whitespace); // skip whitespace

let mut args = Vec::new();

if !tokens.token("(") {
    panic!("error at {}", tokens.position());
}

// if the call succeeded, than '(' is consumed
    
tokens.take_while(char::is_whitespace); // skip whitespace

// for the sake of simplicity, I'm gonna stop checking for empty strings
args.push(tokens.take_while(|ch| ch.is_ascii_alphanumeric() || ch == '_').to_string());
tokens.take_while(char::is_whitespace); // skip whitespace

while tokens.token(",") {
    tokens.take_while(char::is_whitespace); // skip whitespace
        
    args.push(tokens.take_while(|ch| ch.is_ascii_alphanumeric() || ch == '_').to_string());

    tokens.take_while(char::is_whitespace); // skip whitespace
}

if !tokens.token(")") {
    panic!("error at {}", tokens.position());
}

assert!(tokens.is_at_end());
assert_eq!(fn_name, "function");
assert_eq!(args.as_slice(), &["123", "other_argument", "456"]);

Cargo features

  • yap (off by default): adds a wrapper for Tokens<'_> that implements yap::Tokens.

Dependencies

~29KB