#buf-read #stream #pattern #splitter #reader #u8 #complex

buf_read_splitter

Stream reader with capacity to split(stop) the stream on a defined pattern (usually &[u8] but can also more complex pattern)

9 releases

3.0.0 Mar 1, 2025
0.3.1 Mar 12, 2025
0.2.2 Jan 31, 2025
0.1.3 Jan 30, 2025

#1063 in Algorithms

Download history 196/week @ 2025-01-15 167/week @ 2025-01-22 351/week @ 2025-01-29 19/week @ 2025-02-05 16/week @ 2025-02-12 280/week @ 2025-02-26 17/week @ 2025-03-05

313 downloads per month

MIT license

34KB
538 lines

buf_read_splitter

buf_read_splitter as the ability to read a stream inside a buffer(fixed length or not), reading until defined pattern (like an array of [u8], or it more complex pattern)

Below an example with a separator as an array of bytes :

use std::io::Read;
use buf_read_splitter::{
       buf_read_splitter::BufReadSplitter,
       match_result::MatchResult,
       options::Options,
       simple_matcher::SimpleMatcher,
       };

// We simulate a stream with this content :
let input = "First<SEP>Second<SEP>Third<SEP>Fourth<SEP>Fifth".to_string();
let mut input_reader = input.as_bytes();

// We create a reader that will end at each "<SEP>" :
let mut reader = BufReadSplitter::new(
           &mut input_reader,
           SimpleMatcher::new(b"<SEP>"),
           Options::default(),
);

// List of separate String will be listed in a Vector :
let mut words = Vec::new();

// Working variables :
let mut word = String::new();
let mut buf = vec![0u8; 100];

while {
 // Read in buffer like any other buffer :
 match reader.read(&mut buf) {
   Err(err) => panic!("Error while reading : {err}"),
   Ok(sz) => {
     if sz > 0 {
       // === Treat the buffer ===
       let to_str = String::from_utf8_lossy(&buf[..sz]);
       word.push_str(&to_str);
       true
     } else {
       // === End of buffer part ===
       words.push(word.clone());
       word.clear();
       match reader.next_part() {  //Try to pass to the next part of the buffer
         Ok(Some(())) => true,     //There's a next part!
         Ok(None) => false,        //There's no next part, so go out of the loop
         Err(err) => panic!("Error in next_part() : {err}"),
       }
     }
   }
 }
} {}

assert_eq!(words.len(), 5);
assert_eq!(&words[0], "First");
assert_eq!(&words[1], "Second");
assert_eq!(&words[2], "Third");
assert_eq!(&words[3], "Fourth");
assert_eq!(&words[4], "Fifth");

But it can also be a more complex pattern.
For example below a Matcher able to split a stream at each Mac, Unix or Windows end of line :

use buf_read_splitter::{
       match_result::MatchResult,
       matcher::Matcher,
       };

struct AllEndOfLineMatcher {
   prev_char: u8,
}
impl AllEndOfLineMatcher {
   pub fn new() -> Self {
       Self { prev_char: 0 }
   }
}
impl Matcher for AllEndOfLineMatcher {
   // This function is called at each byte read
   //   `el_buf` contains the value of the byte
   //   `pos` contains the position matched
   fn sequel(&mut self, el_buf: u8, pos: usize) -> MatchResult {
       if pos == 0 {
           if el_buf == b'\r' || el_buf == b'\n' {
               self.prev_char = el_buf;
               MatchResult::NeedNext
           } else {
               MatchResult::Mismatch
           }
       } else if pos == 1 {
           if el_buf == b'\n' && self.prev_char == b'\r' {
               MatchResult::Match(0, 0) //We are on \r\n
           } else {
               MatchResult::Match(0, 1) //We have to ignore the last byte since it's not a part of the end of line pattern
           }
       } else {
           panic!("We can't reach this code since we just manage 2 positions")
       }
   }

   // This function is called at the end of the buffer, useful to manage partial cases
   fn sequel_eos(&mut self, pos: usize) -> MatchResult {
       if pos == 0 {
           MatchResult::Match(0, 0) //Here the last char is \r or \n, at position 0
       } else {
           panic!("We can't reach this code since we just manage 2 positions")
       }
   }
}

...so the reader can be created like this :
let mut reader = BufReadSplitter::new( &mut input_reader, AllEndOfLineMatcher::new(), Options::default() );

The separator pattern can be changed on the fly by calling the function matcher :
reader.matcher(SimpleMatcher::new(b"<CHANGE SEP>"))

The size of the buffer part can be limited.
For example to limit the stream to read only 100 bytes :
reader.set_limit_read(Some(100));
...and to reinitialize it :
reader.set_limit_read(None);\

A call to .next_part() pass to the next part, however the end was reached or not, so it skips what has not been readed.

For debug purpose, you can activate the "log" features in the Cargo.toml :
[dependencies]
buf_read_splitter = { path = "../buf_read_splitter_v0.3/buf_read_splitter", features = ["log",] }

For more information :\

A suggestion or bug alert ? Feel free to fill an issue :\

You can also contact me :

Thanks for your interest!

License: MIT

Dependencies

~0–7MB
~41K SLoC