0.1.3 |
|
---|---|
0.1.2 |
|
0.1.1 |
|
0.1.0 |
|
#20 in #utf8
4MB
261 lines
Chisel - CharStream
Overview
This repository contains a very simple, lean implementation of a transcoder that will consume u8
bytes from a given
Read
implementation, and convert into the Rust internal char
type. This is an offshoot lib from an ongoing toy
parser project, and is used as the first stage of the scanning/lexing phase of the parser in order avoid unnecessary
allocations during the u8
sequence -> char
conversion.
Note that the implementation is pretty fast and loose, and under the covers utilises some bit-twiddlin' in conjunction
with the unsafe transmute
function to do the conversions. No string allocations are used during conversion.
There is minimal checking (other than bit-masking) of the inbound bytes - it is not intended to be a UTF8 validation library.
Usage
Usage is very simple, provided you have something that implements Read
in order to source some bytes:
Create from a slice
Just wrap your array in a reader, and then plug it into a new instance of CharStream
:
let buffer: &[u8] = &[0x10, 0x12, 0x23, 0x12];
let mut reader = BufReader::new(buffer);
let _stream = CharStream::new(&mut reader);
Create from a file
Just crack open your file, wrap in a Read
instance and then plug into a new instance of CharStream
:
let path = PathBuf::from("somefile.txt");
let f = File::open(path);
let mut reader = BufReader::new(f.unwrap());
let _stream = CharStream::new(&mut reader);
Consuming char
s
You can either pull out new char
s from the reader wrapped inside a Result
type:
loop {
let result = stream.next_char();
if result.is_err() {
break;
}
}
Alternatively, you can just use the CharStream
as an Iterator
:
let stream = CharStream::new(&mut reader);
for c in stream {
match c {
Some(c) => ...
None => ...
}
}