#reader #line #file-io #text-file #reading #input-file #split

pyadvreader

Split text file into text sequences, strings and (line) comments

5 stable releases

2.2.0 Jan 26, 2024
2.1.1 Jan 23, 2024
2.1.0 Jan 19, 2024
2.0.0 Nov 11, 2023
1.2.0 Sep 10, 2023

#994 in Rust patterns

MIT license

135KB
4K SLoC

advreader

This library provides an easy way to read in input lines as byte slices for high efficiency. It's basically lines from the standard library, but it reads each line as a byte slice (&[u8]). This performs significantly faster than lines() in the case you don't particularly care about unicode, and basically as fast as writing the loops out by hand. Although the code itself is somewhat trivial, I've had to roll this in at least 4 tools I've written recently and so I figured it was time to have a convenience crate for it.

Installation

This tool will be available via Crates.io, so you can add it as a dependency in your Cargo.toml:

[dependencies]
advreader = "2.0.0"

Usage

It's quite simple; in the place you would typically call lines on a BufRead implementor, you can now call byte_lines to retrieve a structure used to walk over lines as &[u8] (and thus avoid allocations). There are two ways to use the API, and both are shown below:

// our input file we're going to walk over lines of, and our reader
let file = File::open("./my-input.txt").expect("able to open file");
let reader = BufReader::new(file);
let mut lines = reader.byte_lines();

// Option 1: Walk using a `while` loop.
//
// This is the most performant option, as it avoids an allocation by
// simply referencing bytes inside the reading structure. This means
// that there's no copying at all, until the developer chooses to.
while let Some(line) = lines.next() {
    // do something with the line
}

// Option 2: Use the `Iterator` trait.
//
// This is more idiomatic, but requires allocating each line into
// an owned `Vec` to avoid potential memory safety issues. Although
// there is an allocation here, the overhead should be negligible
// except in cases where performance is paramount.
for line in lines.into_iter() {
    // do something with the line
}

This interface was introduced in the v2.x lineage of advreader. The Iterator trait was previously implemented in v1.x, but required an unsafe contract in trying to be too idiomatic. This has since been fixed, and all unsafe code has been removed whilst providing IntoIterator implementations for those who prefer the cleaner syntax.

Dependencies

~3.5–9MB
~72K SLoC