#io #buffered #no-alloc #read-file

nightly no-std buffed

Traits & implementation of parsing buffered IO

1 unstable release

0.1.0 Feb 20, 2023

#1634 in Parser implementations

MIT/Apache

270KB
670 lines

pipeline coverage lib.rs docs.rs chat

buffed, a buffed buffered reader for Rust.

buffed provides traits and implementations akin to std::io::BufRead. buffed's traits and strucs allow reading serialized data directly from a byte stream. It does so without requiring any structure in the input while still taking advantage of buffering. It also makes it easy to avoid copies and allocations.

NOTE: This crate was primarily made as a personal challenge. It has not been properly tested or audited, and you shouldn't use it for anything more serious than a hobby. Consider using sequoia-pgp's buffered-reader instead. We will still happily accept issues and merge pull requests ! We still want this crate to be a nice abstraction.

The root of the issue

This crate was first designed as a solution to the following problem: read a user-provided, UTF8-encoded, text file, and read some data through it. Do so blazingly fasttm, without ever hogging memory and still be robust against malicious input.

The requirement for speed implied the use of buffering, which already constrains what we can use in the standard library:

  • std::io::Read::read_to_string loads the whole file into memory, which is obviously not an option to meet the second and last requirements.
  • std::io::BufRead::read_line actually has the same issue if we are to defend against malicious input: a giant file without newlines will be loaded whole in memory.

The standard library also doesn't allow controlling allocation with a reasonably high-level API (and we're don't expect it to), which could help make things faster.

buffed's API

buffed provides several traits:

  • BuffedRead, akin to std::io::BufRead, but gives more control on buffering and allows reading types implementing FromBytes without copy.
  • FromBytes (and the associated FromBytesError), encapsulating the parsing (and error reporting) logic for read data. The plan is for this trait to be implemented for most types you would ever need it to, like nom's parser outputs, serde-deserializable types, etc. If you'd like it implemented for something, please file an issue/PR ! As long as it is behind an non-default optional feature, it should be easy to get it merged.
  • Buffer, a trait for BuffedReader's buffer. It can be used by other BuffedRead implentations to take advantage of other buffering techniques than the one used by BuffedReader.

And some types:

  • BuffedReader, what BufReader is to BufRead, for BuffedRead. This is a default implementation wrapping another type implementing std::io::Read. Notably, it uses a BoxBuffer, which can be swapped out for another implementation of Buffer as needed.
  • BoxBuffer, a "default" implementation of Buffer based on a simple Boxed byte slice (Box<[u8]>).
  • Error, a general-purpose error type used by BuffedRead.

Examples

Reading the whole content of a file might using the Buffedread API might be done like:

use std::fs::File;

use buffed::{BuffedRead, BuffedReader, FromBytes};

fn main() {
    let file = File::open("tests/assets/capital-ru.txt").unwrap();
    let mut r = BuffedReader::new(file);
    loop {
        match r.fill_buf() {
          // EOF
          Ok("") => { return; }
          Ok(data) => {
            let size = data.size();
            // Do something with data
            // ...
            r.consume(size).unwrap();
          },
          // Oopsies
          Err(err) => panic!("{err}"),
        }
    }
}

Alternatively, the require_fill_buf(amount: usize) method of BuffedRead accepts a minimum amount of data to return (unless EOF) was reached, and require_fill_buf_no_alloc(amount: usize) does the same, but errors out if the buffer needs to be reallocated for the amount of data to fit.

Alternatives

If you're looking for something similar to buffed, you may be interested in:

  • buffered-reader: A buffered reader implementation for the sequoia-pgp project. It does everything we want to, including a way to parse objects using its concept of stacking readers. It's also a way more battle-tested and robust implementation ! If you want to use buffed for anything more serious than a hobby, you should consider using this instead.
  • Implementing a tailored solution using the std::io types with a crate like serde or nom. This is not the easiest path but it's probably the best if none other fits your case.

The following could also be of value, even though we wouldn't consider using them for the reasons noted:

  • io-enum: Achieves the same as buffed, without the extension of std's traits, but only for enums.
  • buf_redux: Replaces std::io's buffered types, but still only reads bytes, which makes handling UTF8 without copying hard and clumsy. No release since August 2019.
  • text_io: An unbuffered macro-based approach. We found it too fragile, and it hasn't seen much activity for a year.
  • ciborium-io: Simple Read & Write traits for #![no_std]. Largely incomplete, and hasn't seen activity since December 2021.
  • layered-io: Extends std's Read & Write traits in the same idea as buffed, but doesn't add a lot.

No runtime deps

Features