#sentence #tokenizer #splitter #string

nightly token

A simple string-tokenizer (and sentence splitter) Note: If you find that you would like to use the name for something more appropriate, please just send me a mail at jaln at itu dot dk

1 release (0 unstable)

Uses old Rust 2015

1.0.0-rc1 Feb 21, 2015

#6 in #splitter

MIT license

11KB
129 lines

Token

Build status (master)

This is a small package containing a simple string-tokenizer for the rust programming language. The package also contains a simple sentence-splitting iterator.

(The sentence splitter might be moved, once I find out where I want it).

Documentation

machtan.github.io/token-rs/token

Building

Add the following to your Cargo.toml file

dependencies.token git = "https://github.com/machtan/token-rs"

Examples

extern crate token;

let separators = vec![' ', '\n', '\t', '\r'];
let source: &str = "    Hello world \n  How do you do\t-Finely I hope";

let mut tokenizer = tokenizer::Tokenizer::new(source.as_bytes(), separators);
println!("Tokenizing...");
for token in tokenizer {
    println!("- Got token: {}", token.unwrap());
}
println!("Done!");

License

MIT (do what you want with it)


lib.rs:

A simple string tokenizer, and a tokenizer of sentences based on it. The tokenizer ignores all the given separator chars, and returns the characters between as string slice

Examples

General use (as an iterator)

This is how you will probably use it

let separators = vec![' ', '\n', '\t', '\r'];
let source: &str = "    Hello world \n  How do you do\t-Finely I hope";

let mut tokenizer = token::Tokenizer::new(source.as_bytes(), separators);
println!("Tokenizing...");
for token in tokenizer {
    println!("- Got token: {}", token.unwrap());
}
println!("Done!");

Behavior

This is what to expect when parsing a string (or input from a reader)

let separators = vec![' ', '\n', '\t', '\r'];
let source: &str = "    Hello world \n  How do you do\t-Finely I hope";

let mut tokenizer = token::Tokenizer::new(source.as_bytes(), separators);
assert_eq!("Hello",     tokenizer.next().expect("1").unwrap());
assert_eq!("world",     tokenizer.next().expect("2").unwrap());
assert_eq!("How",       tokenizer.next().expect("3").unwrap());
assert_eq!("do",        tokenizer.next().expect("4").unwrap());
assert_eq!("you",       tokenizer.next().expect("5").unwrap());
assert_eq!("do",        tokenizer.next().expect("6").unwrap());
assert_eq!("-Finely",   tokenizer.next().expect("7").unwrap());
assert_eq!("I",         tokenizer.next().expect("8").unwrap());
assert_eq!("hope",      tokenizer.next().expect("9").unwrap());
assert_eq!(None,        tokenizer.next());

No runtime deps