#web-scraping #html-parser #http-request #css-selectors #data #extract

wappu

Wappu is a fast and flexible web scraping library for Rust, designed to efficiently navigate and extract data from websites. Perfect for data mining, content aggregation, and web automation tasks.

7 releases

0.3.0 Mar 31, 2024
0.2.4 Mar 5, 2024
0.2.2 Feb 29, 2024
0.1.0 Feb 25, 2024

#187 in HTTP client

Download history 219/week @ 2024-02-21 322/week @ 2024-02-28 74/week @ 2024-03-06 13/week @ 2024-03-13 133/week @ 2024-03-27 31/week @ 2024-04-03

180 downloads per month

Apache-2.0

42KB
1K SLoC

Wappu: A Rust Web Scraping Library

Wappu is a comprehensive web scraping library written in Rust, designed for ease of use and performance. It integrates seamlessly HTTP client capabilities with powerful HTML parsing functionalities, allowing users to fetch and parse web content efficiently.

Features

  • Asynchronous HTTP Requests: Fetch web pages asynchronously with a simple-to-use HTTP client.
  • HTML Parsing: Easily parse and query HTML documents to extract relevant data.
  • Flexible Selectors: Use CSS-like selectors to pinpoint and extract elements from parsed HTML.
  • Error Handling: Robust error handling for both network requests and HTML parsing.

Getting Started

Prerequisites

Ensure you have Rust and Cargo installed on your system. Wappu requires Rust version 1.40 or newer.

Installation

Add Wappu to your project's Cargo.toml:

[dependencies]
wappu = "0.2.4"
reqwest = "0.11.24"

Quick Example

Here's a quick example to fetch and parse the title of example.com:

use wappu::{WappuClient, HtmlParser, Selector};
#[tokio::main]
async fn main() {
    let client = WappuClient::new();
    let html_content = client.get("http://example.com", None).await.expect("Failed to fetch content");

    let parsed_html = HtmlParser::new().parse_html(&html_content.text());
    let mut selector = Selector::new();
    let title_selector = selector.from_tag_name("h1");
    let title_selection = title_selector.select(&parsed_html);
    let title_text = title_selection.text();

    println!("Title: {}", title_text);
}

Documentation

For detailed documentation, including API reference and advanced usage, visit Wappu Documentation. (Not yet done)

Contributing

Contributions are welcome! Please see our Contributing Guide for more details.

License

Wappu is licensed under the Apache License 2.0 - see the LICENSE file for details.

Acknowledgments

  • Thanks to the Rust community for the invaluable resources and support.
  • Special thanks to httpbin for providing HTTP request & response service, making it easier to test HTTP client functionalities.

Dependencies

~8–22MB
~331K SLoC