5 releases

0.2.3	Apr 22, 2023
0.2.2	Nov 3, 2019
0.2.1	Jul 9, 2017
0.2.0	Jul 8, 2017
0.1.0	Jul 7, 2017

#2878 in Parser implementations

928 downloads per month
Used in uupdump

MIT license

23KB
441 lines

Utility for extracting data from HTML tables.

This library allows you to parse tables from HTML documents and iterate over their rows. There are three entry points:

Table::find_first finds the first table.
Table::find_by_id finds a table by its HTML id.
Table::find_by_headers finds a table that has certain headers.

Each of these returns an Option<Table>, since there might not be any matching table in the HTML. Once you have a table, you can iterate over it and access the contents of each Row.

Examples

Here is a simple example that uses Table::find_first to print the cells in each row of a table:

let html = r#"
    <table>
        <tr><th>Name</th><th>Age</th></tr>
        <tr><td>John</td><td>20</td></tr>
    </table>
"#;
let table = table_extract::Table::find_first(html).unwrap();
for row in &table {
    println!(
        "{} is {} years old",
        row.get("Name").unwrap_or("<name missing>"),
        row.get("Age").unwrap_or("<age missing>")
    )
}

If the document has multiple tables, we can use Table::find_by_headers to identify the one we want:

let html = r#"
    <table></table>
    <table>
        <tr><th>Name</th><th>Age</th></tr>
        <tr><td>John</td><td>20</td></tr>
    </table>
"#;
let table = table_extract::Table::find_by_headers(html, &["Age"]).unwrap();
for row in &table {
    for cell in row {
        println!("Table cell: {}", cell);
    }
}

TableExtract

TableExtract is a Rust library for extracting data from HTML tables. It is inspired by Perl's HTML::TableExtract.

Check out the crate documentation for more information.

Usage

TableExtract is on crates.io. To use it, just add this to your Cargo.toml:

[dependencies]
table-extract = "0.2"

Contributing

Contributions are welcome! There are two things to keep in mind:

This project uses the stable Rust toolchain from rustup.
This project uses cargo fmt to keep the code tidy.

License

TableExtract is available under the MIT License; see LICENSE for details.

Dependencies

~5–11MB
~105K SLoC