#web-crawler #spider

spider_utils

Utilities to use for Spider Web Crawler

384 stable releases

new 2.26.0 Jan 11, 2025
2.23.3 Dec 31, 2024
2.13.84 Nov 30, 2024
0.2.3 Aug 20, 2024
0.1.3 Jul 24, 2024

#2091 in Web programming

Download history 1872/week @ 2024-09-20 1934/week @ 2024-09-27 4515/week @ 2024-10-04 2454/week @ 2024-10-11 4096/week @ 2024-10-18 1601/week @ 2024-10-25 3887/week @ 2024-11-01 1217/week @ 2024-11-08 726/week @ 2024-11-15 7331/week @ 2024-11-22 4098/week @ 2024-11-29 3809/week @ 2024-12-06 1868/week @ 2024-12-13 1013/week @ 2024-12-20 1619/week @ 2024-12-27 1840/week @ 2025-01-03

7,321 downloads per month

MIT license

695KB
14K SLoC

spider_utils

Utilities to use to help with getting the most out of spider.

CSS Scraping

use spider::{
    hashbrown::HashMap,
    packages::scraper::Selector,
};
use spider_utils::{QueryCSSMap, QueryCSSSelectSet, build_selectors, css_query_select_map_streamed};

async fn css_query_selector_extract() {
    let map = QueryCSSMap::from([(
        "list",
        QueryCSSSelectSet::from([".list", ".sub-list"]),
    )]);
    let data = css_query_select_map_streamed(
        r#"<html>
            <body>
                <ul class="list"><li>First</li></ul>
                <ul class="sub-list"><li>Second</li></ul>
            </body>
        </html>"#,
        &build_selectors(map),
    )
    .await;

    println!("{:?}", data);
    // {"list": ["First", "Second"]}
}

Features

You can use the feature flag indexset to order the CSS scraping extraction order.

Dependencies

~19–34MB
~600K SLoC