10 releases

0.0.11-alpha.1 Jun 14, 2023
0.0.10 Jun 3, 2023
0.0.9 May 16, 2023

#2150 in Web programming

Download history 31/week @ 2024-01-05 183/week @ 2024-01-12 23/week @ 2024-01-19 4/week @ 2024-01-26 3/week @ 2024-02-02 35/week @ 2024-02-16 48/week @ 2024-02-23 16/week @ 2024-03-01 16/week @ 2024-03-08 52/week @ 2024-03-15 26/week @ 2024-03-22 33/week @ 2024-03-29 37/week @ 2024-04-05 17/week @ 2024-04-12 33/week @ 2024-04-19

123 downloads per month
Used in spider

MIT license

10KB
121 lines

jsdom

A fast javascript dom parser for rust built for web scraping.

cargo add jsdom

use std::collections::HashSet;
use jsdom::extract::extract_links;

const SCRIPT: &str = r###"
var ele = document.createElement('a');
ele.href = 'https://a11ywatch.com';
"###;

#[test]
fn parse_links() {
    // build tree with elements created from the nodes todo
    let links: HashSet<String> = extract_links(SCRIPT);

    assert!(links.contains("https://a11ywatch.com"))
}

Features

This package will rollout features that are most important for web scraping first.

  1. hashbrown: Enable the hashbrown crate.
  2. tokio: Enable tokio streaming utils.

Stage 0.1

Intro stage can handle elements created in statements and expressions.

Dependencies

~1.7–8.5MB
~64K SLoC