#readability #html #content #pages #extracting #dom #web

dom_smoothie

A Rust crate for extracting relevant content from web pages

2 releases

new 0.1.1 Dec 18, 2024
0.1.0 Dec 17, 2024

#4 in #readability

32 downloads per month

MIT license

99KB
2K SLoC

DOM_SMOOTHIE

Crates.io version Download docs.rs docs codecov

Rust CI

A Rust crate for extracting relevant content from web pages.

dom_smoothie closely follows the implementation of readability.js, bringing its functionality to Rust.

Examples

Basic Example

use std::error::Error;

use dom_smoothie::Readability;

fn main() -> Result<(), Box<dyn Error>> {
    let cfg = dom_smoothie::Config {
        classes_to_preserve: vec!["caption".into()],
        ..Default::default()
    };

    let html = include_str!("../test-pages/ok/001/source.html");

    let mut readability = Readability::new(html, Some("http://fakehost/test/"), Some(cfg))?;
    let article = readability.parse()?;

    println!("Title: {}", &article.title);
    println!("Content:\n {}", &article.content);
    
    Ok(())
}

License

Licensed under MIT (LICENSE or http://opensource.org/licenses/MIT).

Contribution

Any contribution intentionally submitted for inclusion in this project will be licensed under the MIT license, without any additional terms or conditions.

Dependencies

~11–17MB
~188K SLoC