14 releases (5 breaking)
0.6.0 | Oct 22, 2024 |
---|---|
0.5.0 | Oct 21, 2024 |
0.4.2 | Oct 9, 2024 |
0.3.2 | Sep 30, 2024 |
0.1.0 | Jan 17, 2024 |
#679 in Parser implementations
360 downloads per month
Used in autobib
270KB
6K
SLoC
WARNING
This crate is under active development and the public API may change substantially on every minor version change. The (de)serialization API is relatively stable, but some of the publicly-exposed internal state may change, particularly concerning the handling of errors. Until this is stabilized, use at your own risk!
Serde bibtex
A Rust library providing a serde interface for .bib
file (de)serialization.
The implementation is minimally opinionated and feature-rich for convenient downstream consumption by other libraries or binaries.
For examples and a thorough documentation of features, visit the docs.
Deserializer
Here are the main features. See the deserializer docs for more detail.
Flexible
- Structured: read into Rust types with automatic
@string
macro expansion and other convenience features. - Unstructured: do not expand macros or collect fields values to preserve the structure of the original bibtex.
- Deserialize from bytes to defer UTF-8 conversion, or even pass-through raw bytes.
- Error-tolerant
Iterator
API that allows skipping malformed entries.
Explicit and unambiguous syntax
- Aims for compatibility with and tested against an independently implemented pest grammar.
- Aim for compatibility with biber but without some of biber's undocumented idiosyncracies or unfixable parsing bugs.
Fast
- Low overhead manual parser implementation (see benchmarks).
- Zero-copy deserialization.
- Selective capturing of contents (see benchmarks for speed differences)
Serializer
Here are the main features. See the serializer docs for more detail.
Flexible
- Flexibly serialize many types which are vaguely structured like BibTeX entries.
- Sufficiently general to generate any valid BibTeX bibliography (up to syntactic equivalence), including all entry types such as
@string
macros, and out-putting unexpanded macros. - Implementable
Formatter
trait which allows total customization of generated BibTeX.
Opinionated
- Default
Formatter
implementations serialize in a standardized format to guarantee unambiguous parsing even by other tools. - Compact formatter when serializing for consumption by non-humans.
Robust
- Validate during serialization to guarantee generation of valid BibTeX.
Comparison with other crates
typst/biblatex
We do not attempt to interpret the contents of the entries in the .bib
file and instead defer interpretation for downstream consumption.
On the other hand, biblatex is intended to support typst, which requires interpreting the contents of the fields (for example, parsing of $math$
in field values).
In this sense, we might consider our implementation closer to the biblatex::RawBibliography
entrypoint, but with the substantial extra flexibility of reading into any type implementing an appropriate Deserialize
.
charlesvdv/nom-bibtex
The functionality in this crate essentially supercedes nom-bibtex.
The only feature of nom-bibtex
that we do not support is the capturing of comments not explicitly contained in a @comment
entry.
typho/bibparser
The functionality in this crate essentially supercedes bibparser.
Benchmarks
The benchmark code can be find in benches/compare.rs
.
The bibliography file used is assets/tugboat.bib
, which is part of the testing data used by biber.
It is a 2.64 MB 73,993-line .bib
file.
ignore
: Deserialize usingserde::de::IgnoredAny
to parse the file but ignore the contents.struct
: Deserialize using a struct with entries capturing every field present inassets/tugboat.bib
(15 fields total), expanding macros and collapsing field values.borrow
: Deserialize into a fully borrowed Rust type which captures all data in the file but does not expand macros or collapse field values.biblatex
: Parse usingbiblatex::RawBibliography::parse
(most similar toborrow
).copy
: Deserialize into an owned Rust type with macro expansion, field value collapsing, and case-insensitive comparison where appropriate.nom-bibtex
: Parse usingnom-bibtex::Bibtex::parse
(most similar tocopy
).
The benchmarks were performed on an Intel(R) Core(TM) i7-9750H CPU @ 2.60 GHz (2019 MacBook Pro).
The speedup factor is relative to biblatex
.
benchmark | factor | runtime | throughput |
---|---|---|---|
ignore | 4.8x | [3.3923 ms 3.3987 ms 3.4058 ms] |
660 MB/s |
struct | 1.9x | [8.5496 ms 8.7481 ms 8.9924 ms] |
300 MB/s |
borrow | 1.3x | [12.932 ms 12.962 ms 12.992 ms] |
200 MB/s |
biblatex | 1.0x | [16.184 ms 16.224 ms 16.266 ms] |
160 MB/s |
copy | 0.75x | [21.455 ms 21.690 ms 21.935 ms] |
120 MB/s |
nom-bibtex | 0.23x | [71.607 ms 71.912 ms 72.343 ms] |
40 MB/s |
The bibparser crate is not included in this benchmark as it is unable to parse the input file.
Safety
This crate uses some unsafe
for string conversions when we can guarantee for other reasons that a string slice is at a valid codepoint.
Dependencies
~0.3–1MB
~21K SLoC