14 releases
0.2.2 | Apr 13, 2024 |
---|---|
0.2.1 | Apr 7, 2024 |
0.2.0 | Mar 12, 2024 |
0.1.10 | Mar 11, 2024 |
0.1.1 | Oct 29, 2023 |
#308 in Parser implementations
368 downloads per month
68KB
1.5K
SLoC
rtf-parser
A safe Rust RTF parser & lexer library designed for speed and memory efficiency, with no external dependencies. The official documentation is available at docs.rs/rtf-parser.
Installation
This library can be installed using cargo with the CLI :
cargo add rtf-parser
Or adding rtf-parser = "<last-version>"
under [dependencies] in your Cargo.toml
.
Design
The library is split into 2 main components:
- The lexer
- The parser
The lexer scans the document and returns a Vec<Token>
which represent the RTF file in a code-understandable manner.
These tokens can then be passed to the parser to transcript it to a real document : RtfDocument
.
use rtf_parser::lexer::Lexer;
use rtf_parser::tokens::Token;
use rtf_parser::parser::Parser;
use rtf_parser::document::RtfDocument;
fn main() -> Result<(), Box<dyn Error>> {
let tokens: Vec<Token> = Lexer::scan("<rtf>")?;
let parser = Parser::new(tokens);
let doc: RtfDocument = parser.parse()?;
}
or in a more concise way :
use rtf_parser::document::RtfDocument;
fn main() -> Result<(), Box<dyn Error>> {
let doc: RtfDocument = RtfDocument::try_from("<rtf>")?;
}
The RtfDocument
struct implement the TryFrom
trait for :
&str
String
&mut std::fs::File
and a from_filepath
constructor that handle the i/o internally.
The error returned can be a LexerError
or a ParserError
depending on the phase wich failed.
An RtfDocument
is composed with :
- the header, containing among others the font table, the color table and the encoding.
- the body, which is a
Vec<StyledBlock>
A StyledBlock
contains all the information about the formatting of a specific block of text.
It contains a Painter
for the text style, a Paragraph
for the layout, and the text (String
).
The Painter
is defined below, and the rendering implementation depends on the user.
pub struct Painter {
pub font_ref: FontRef,
pub font_size: u16,
pub bold: bool,
pub italic: bool,
pub underline: bool,
pub superscript: bool,
pub subscript: bool,
pub smallcaps: bool,
pub strike: bool,
}
The layout information are exposed in the paragraph
property :
pub struct Paragraph {
pub alignment: Alignment,
pub spacing: Spacing,
pub indent: Indentation,
pub tab_width: i32,
}
It defined the way a block is aligned, what spacing it uses, etc...
You also can extract the text without any formatting information, with the to_text()
method of the RtfDocument
struct.
fn main() -> Result<(), Box<dyn Error>> {
let rtf = r#"{\rtf1\ansi{\fonttbl\f0\fswiss Helvetica;}\f0\pard Voici du texte en {\b gras}.\par}"#;
let tokens = Lexer::scan(rtf)?;
let document = Parser::new(tokens)?;
let text = document.to_text();
assert_eq!(text, "Voici du texte en gras.");
}
Examples
A complete example of rtf parsing is presented below :
use rtf_parser::lexer::Lexer;
use rtf_parser::parser::Parser;
fn main() -> Result<(), Box<dyn Error>> {
let rtf_text = r#"{ \rtf1\ansi{\fonttbl\f0\fswiss Helvetica;}\f0\pard Voici du texte en {\b gras}.\par }"#;
let tokens = Lexer::scan(rtf_text)?;
let doc = Parser::new(tokens).parse()?;
assert_eq!(
doc.header,
RtfHeader {
character_set: Ansi,
color_table: ColorTable::Default(),
font_table: FontTable::from([
(0, Font { name: "Helvetica", character_set: 0, font_family: Swiss })
])
}
);
assert_eq!(
doc.body,
[
StyleBlock {
painter: Painter { font_ref: 0, font_size: 0, bold: false, italic: false, underline: false },
paragraph: Paragraph {
alignment: LeftAligned,
spacing: Spacing { before: 0, after: 0, between_line: Auto, line_multiplier: 0, },
indent: Indentation { left: 0, right: 0, first_line: 0, },
tab_width: 0,
},
text: "Voici du texte en ",
},
StyleBlock {
painter: Painter { font_ref: 0, font_size: 0, bold: true, italic: false, underline: false },
paragraph: Paragraph {
alignment: LeftAligned,
spacing: Spacing { before: 0, after: 0, between_line: Auto, line_multiplier: 0, },
indent: Indentation { left: 0, right: 0, first_line: 0, },
tab_width: 0,
},
text: "gras",
},
StyleBlock {
painter: Painter { font_ref: 0, font_size: 0, bold: false, italic: false, underline: false },
paragraph: Paragraph {
alignment: LeftAligned,
spacing: Spacing { before: 0, after: 0, between_line: Auto, line_multiplier: 0, },
indent: Indentation { left: 0, right: 0, first_line: 0, },
tab_width: 0,
},
text: ".",
},
]
);
return Ok(());
}
Benchmark
For now, there is no comparable crates to rtf-parser
.
However, the rtf-grimoire
crate provide a similar Lexer. Here is a quick benchmark of the lexing and parsing of a 500kB rtf docuement.
Crate | Version | Duration |
---|---|---|
rtf-parser |
v0.2.2 | 30 ms |
rtf-grimoire (only lexing) |
v0.2.1 | 123 ms |
This benchmark has been made on an Intel MacBook Pro.
For the rtf-parser
, most of the compute time (65 %) is spent by the lexing process. There is still lot of room for improvement.
Dependencies
~1–1.4MB
~35K SLoC