#web-crawler #spider

spider_utils

Utilities to use for Spider Web Crawler

327 stable releases

new 2.21.22 Dec 14, 2024
2.21.21 Dec 13, 2024
2.13.84 Nov 30, 2024
2.11.0 Oct 31, 2024
0.1.3 Jul 24, 2024

#2123 in Web programming

Download history 2894/week @ 2024-08-24 1522/week @ 2024-08-31 1354/week @ 2024-09-07 861/week @ 2024-09-14 1866/week @ 2024-09-21 2148/week @ 2024-09-28 4593/week @ 2024-10-05 2316/week @ 2024-10-12 4357/week @ 2024-10-19 1712/week @ 2024-10-26 3661/week @ 2024-11-02 999/week @ 2024-11-09 678/week @ 2024-11-16 7519/week @ 2024-11-23 4326/week @ 2024-11-30 3338/week @ 2024-12-07

16,025 downloads per month

MIT license

705KB
14K SLoC

spider_utils

Utilities to use to help with getting the most out of spider.

CSS Scraping

use spider::{
    hashbrown::HashMap,
    packages::scraper::Selector,
};
use spider_utils::{QueryCSSMap, QueryCSSSelectSet, build_selectors, css_query_select_map_streamed};

async fn css_query_selector_extract() {
    let map = QueryCSSMap::from([(
        "list",
        QueryCSSSelectSet::from([".list", ".sub-list"]),
    )]);
    let data = css_query_select_map_streamed(
        r#"<html>
            <body>
                <ul class="list"><li>First</li></ul>
                <ul class="sub-list"><li>Second</li></ul>
            </body>
        </html>"#,
        &build_selectors(map),
    )
    .await;

    println!("{:?}", data);
    // {"list": ["First", "Second"]}
}

Features

You can use the feature flag indexset to order the CSS scraping extraction order.

Dependencies

~19–34MB
~597K SLoC