7 releases
0.3.0 | Mar 31, 2024 |
---|---|
0.2.4 | Mar 5, 2024 |
0.2.2 | Feb 29, 2024 |
0.1.0 | Feb 25, 2024 |
#320 in HTTP client
42KB
1K
SLoC
Wappu: A Rust Web Scraping Library
Wappu is a comprehensive web scraping library written in Rust, designed for ease of use and performance. It integrates seamlessly HTTP client capabilities with powerful HTML parsing functionalities, allowing users to fetch and parse web content efficiently.
Features
- Asynchronous HTTP Requests: Fetch web pages asynchronously with a simple-to-use HTTP client.
- HTML Parsing: Easily parse and query HTML documents to extract relevant data.
- Flexible Selectors: Use CSS-like selectors to pinpoint and extract elements from parsed HTML.
- Error Handling: Robust error handling for both network requests and HTML parsing.
Getting Started
Prerequisites
Ensure you have Rust and Cargo installed on your system. Wappu requires Rust version 1.40 or newer.
Installation
Add Wappu to your project's Cargo.toml
:
[dependencies]
wappu = "0.2.4"
reqwest = "0.11.24"
Quick Example
Here's a quick example to fetch and parse the title of example.com:
use wappu::{WappuClient, HtmlParser, Selector};
#[tokio::main]
async fn main() {
let client = WappuClient::new();
let html_content = client.get("http://example.com", None).await.expect("Failed to fetch content");
let parsed_html = HtmlParser::new().parse_html(&html_content.text());
let mut selector = Selector::new();
let title_selector = selector.from_tag_name("h1");
let title_selection = title_selector.select(&parsed_html);
let title_text = title_selection.text();
println!("Title: {}", title_text);
}
Documentation
For detailed documentation, including API reference and advanced usage, visit Wappu Documentation. (Not yet done)
Contributing
Contributions are welcome! Please see our Contributing Guide for more details.
License
Wappu is licensed under the Apache License 2.0 - see the LICENSE file for details.
Acknowledgments
- Thanks to the Rust community for the invaluable resources and support.
- Special thanks to httpbin for providing HTTP request & response service, making it easier to test HTTP client functionalities.
Dependencies
~7–19MB
~282K SLoC