6 releases (stable)

2.4.0 May 31, 2020
2.3.0 Oct 29, 2019
2.2.0 Jan 10, 2019
2.1.0 Dec 27, 2018
0.1.0 Dec 21, 2018

#1415 in Text processing


Used in 2 crates

LGPL-3.0-or-later

52KB
1K SLoC

August

August is a Rust crate & program for converting HTML to plain text. It is specifically intended for rendering HTML emails as text; however, it can be used for other purposes like coverting HTML into text for some sort of full-text indexing or other processing.

Usage

Add this to your Cargo.toml:

[dependencies]
august = "^2.4"

and this to your code:

use august;

let input = "<p>Hello</p><i>Here's some HTML!</i>";
println!("{}", august::convert(input, 79));
println!("---");
println!("{}", august::convert_unstyled(input, 79));

The output now looks like this:

Hello

/Here's some HTML!/
---
Hello

Here's some HTML!

Command line program

Cargo comes with a little command-line program august that reads HTML from stdin and prints text to stdout. If you've enabled the term-size feature, it uses the terminal width as the default width, otherwise it uses 79. You can override this by passing -w WIDTH as an argument.

Known issues

  1. There's no CSS support currently. Some support will probably happen sometime, but it's still unclear what is worth implementing.

Changes

2.4.0

  • Added unstyled mode

2.3.0

  • Switch to more stream based functions
  • Update cargo config to use semver versions to prevent broken 0.x dependencies.

2.2.0

  • Add more documentation.
  • Use terminal widdth as default width when run from terminal size.
  • Disable term-size by default to reduce static linking size.
  • Reduce memory usage by about 30% for large files.
  • Reduce use of regexes.

2.1.0

  • Add support for more inline elements: code, dfn, kbd, mark, q, samp, var, del, input, select.
  • Add support for the pre element
  • Show unsupported inline elements inline instead of block.

2.0

Intital Python rewrite (https://alantrick.ca/writings/programming/python_to_rust).

Dependencies

~5.5–8MB
~138K SLoC