#web-crawler #spider

spider_utils

Utilities to use for Spider Web Crawler

206 stable releases

new 2.13.13 Nov 21, 2024
2.13.11 Nov 20, 2024
2.11.0 Oct 31, 2024
2.6.21 Sep 30, 2024
0.1.3 Jul 24, 2024

#2129 in Web programming

Download history 86/week @ 2024-07-29 16/week @ 2024-08-05 263/week @ 2024-08-12 444/week @ 2024-08-19 3685/week @ 2024-08-26 1284/week @ 2024-09-02 1181/week @ 2024-09-09 1386/week @ 2024-09-16 1259/week @ 2024-09-23 2594/week @ 2024-09-30 4417/week @ 2024-10-07 2223/week @ 2024-10-14 4592/week @ 2024-10-21 2209/week @ 2024-10-28 3099/week @ 2024-11-04 428/week @ 2024-11-11

10,712 downloads per month

MIT license

695KB
14K SLoC

spider_utils

Utilities to use to help with getting the most out of spider.

CSS Scraping

use spider::{
    hashbrown::HashMap,
    packages::scraper::Selector,
};
use spider_utils::{QueryCSSMap, QueryCSSSelectSet, build_selectors, css_query_select_map_streamed};

async fn css_query_selector_extract() {
    let map = QueryCSSMap::from([(
        "list",
        QueryCSSSelectSet::from([".list", ".sub-list"]),
    )]);
    let data = css_query_select_map_streamed(
        r#"<html>
            <body>
                <ul class="list"><li>First</li></ul>
                <ul class="sub-list"><li>Second</li></ul>
            </body>
        </html>"#,
        &build_selectors(map),
    )
    .await;

    println!("{:?}", data);
    // {"list": ["First", "Second"]}
}

Features

You can use the feature flag indexset to order the CSS scraping extraction order.

Dependencies

~20–35MB
~620K SLoC