3 releases

0.1.2 Jan 29, 2025
0.1.1 Jan 25, 2025
0.1.0 Jan 25, 2025

#1378 in Web programming

Download history 168/week @ 2025-01-20 167/week @ 2025-01-27 23/week @ 2025-02-03

358 downloads per month

MIT license

15KB
98 lines

Web Crawler Library

This Rust library is a simple web crawler that checks the validity of routes on a given website. It reads a list of routes from a file, constructs full URLs by appending the routes to a base URL, and sends HTTP GET requests to check whether the routes are valid.

Features

  • Reads a list of routes from a text file.
  • Constructs URLs by combining the base URL and the routes.
  • Makes HTTP GET requests to check if the routes are valid.
  • Tracks visited URLs to avoid re-crawling.
  • Returns the number of valid routes.

Installation

1. Add webcrawler to Your Cargo.toml

If you're using this library in your own Rust project, add the following to your Cargo.toml under [dependencies]:

[dependencies]
webcrawler = { version = "0.1.1" }
reqwest = { version = "0.11", features = ["blocking"] }
futures-io = "0.3.30"
url = "2.2"

Usage

1. Import the Web Crawler

In your main.rs or any other Rust file, import the web crawler library:

use webcrawler::WebCrawler;

1. Example usage

Here is an example of how to use the WebCrawler to check the validity of routes:

use webcrawler::WebCrawler;

fn main() {
    let base_url = "http://example.com";  // Replace with your base URL
    let file_path = "src/routes.txt";     // Replace with the path to your routes file

    let mut crawler = WebCrawler::new();

    match crawler.check_valid_routes(base_url, file_path) {
        Ok(valid_routes) => {
            println!("Number of valid routes: {}", valid_routes);
        }
        Err(e) => {
            println!("Error: {}", e);
        }
    }
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributions

Contributions are welcome! If you have any improvements, bug fixes, or feature suggestions, feel free to open an issue or submit a pull request.

Crates.io

You can find this crate and the latest version on crates.io.

Dependencies

~4–20MB
~215K SLoC