1 unstable release
0.1.0 | Feb 20, 2023 |
---|
#2249 in Parser implementations
270KB
670 lines
buffed
, a buffed buffered reader for Rust.
buffed
provides traits and implementations akin to
std::io::BufRead
. buffed
's traits and strucs allow
reading serialized data directly from a byte stream. It does so without
requiring any structure in the input while still taking advantage of buffering.
It also makes it easy to avoid copies and allocations.
NOTE: This crate was primarily made as a personal challenge. It has not been properly tested or audited, and you shouldn't use it for anything more serious than a hobby. Consider using sequoia-pgp's buffered-reader instead. We will still happily accept issues and merge pull requests ! We still want this crate to be a nice abstraction.
The root of the issue
This crate was first designed as a solution to the following problem: read a user-provided, UTF8-encoded, text file, and read some data through it. Do so blazingly fasttm, without ever hogging memory and still be robust against malicious input.
The requirement for speed implied the use of buffering, which already constrains what we can use in the standard library:
std::io::Read::read_to_string
loads the whole file into memory, which is obviously not an option to meet the second and last requirements.std::io::BufRead::read_line
actually has the same issue if we are to defend against malicious input: a giant file without newlines will be loaded whole in memory.
The standard library also doesn't allow controlling allocation with a reasonably high-level API (and we're don't expect it to), which could help make things faster.
buffed
's API
buffed
provides several traits:
BuffedRead
, akin tostd::io::BufRead
, but gives more control on buffering and allows reading types implementingFromBytes
without copy.FromBytes
(and the associatedFromBytesError
), encapsulating the parsing (and error reporting) logic for read data. The plan is for this trait to be implemented for most types you would ever need it to, likenom
's parser outputs,serde
-deserializable types, etc. If you'd like it implemented for something, please file an issue/PR ! As long as it is behind an non-default optional feature, it should be easy to get it merged.Buffer
, a trait forBuffedReader
's buffer. It can be used by otherBuffedRead
implentations to take advantage of other buffering techniques than the one used byBuffedReader
.
And some types:
BuffedReader
, whatBufReader
is toBufRead
, forBuffedRead
. This is a default implementation wrapping another type implementingstd::io::Read
. Notably, it uses aBoxBuffer
, which can be swapped out for another implementation ofBuffer
as needed.BoxBuffer
, a "default" implementation ofBuffer
based on a simpleBox
ed byte slice (Box<[u8]>
).Error
, a general-purpose error type used byBuffedRead
.
Examples
Reading the whole content of a file might using the Buffedread
API might be
done like:
use std::fs::File;
use buffed::{BuffedRead, BuffedReader, FromBytes};
fn main() {
let file = File::open("tests/assets/capital-ru.txt").unwrap();
let mut r = BuffedReader::new(file);
loop {
match r.fill_buf() {
// EOF
Ok("") => { return; }
Ok(data) => {
let size = data.size();
// Do something with data
// ...
r.consume(size).unwrap();
},
// Oopsies
Err(err) => panic!("{err}"),
}
}
}
Alternatively, the require_fill_buf(amount: usize)
method of BuffedRead
accepts a minimum amount of data to return (unless EOF) was reached, and
require_fill_buf_no_alloc(amount: usize)
does the same, but errors out if
the buffer needs to be reallocated for the amount of data to fit.
Alternatives
If you're looking for something similar to buffed
, you may be interested in:
- buffered-reader: A buffered reader
implementation for the sequoia-pgp project. It does everything
we want to, including a way to parse objects using its concept of stacking
readers. It's also a way more battle-tested and robust implementation ! If
you want to use
buffed
for anything more serious than a hobby, you should consider using this instead. - Implementing a tailored solution using the
std::io
types with a crate like serde or nom. This is not the easiest path but it's probably the best if none other fits your case.
The following could also be of value, even though we wouldn't consider using them for the reasons noted:
- io-enum: Achieves the same as
buffed
, without the extension ofstd
's traits, but only for enums. - buf_redux: Replaces
std::io
's buffered types, but still only reads bytes, which makes handling UTF8 without copying hard and clumsy. No release since August 2019. - text_io: An unbuffered macro-based approach. We found it too fragile, and it hasn't seen much activity for a year.
- ciborium-io: Simple
Read
&Write
traits for#![no_std]
. Largely incomplete, and hasn't seen activity since December 2021. - layered-io: Extends
std
'sRead
&Write
traits in the same idea asbuffed
, but doesn't add a lot.