202 stable releases
new 1.31.7 | Jun 4, 2023 |
---|---|
1.27.2 | Apr 23, 2023 |
1.26.7 | Mar 22, 2023 |
1.17.2 | Nov 21, 2022 |
1.10.7 | Jul 27, 2022 |
#328 in Web programming
1,316 downloads per month
180KB
3.5K
SLoC
Spider CLI
A fast command line spider or crawler.
Dependencies
On Linux
- OpenSSL 1.0.1, 1.0.2, 1.1.0, or 1.1.1
Usage
The CLI is a binary so do not add it to your Cargo.toml
file.
cargo install spider_cli
Cli
The following can also be ran via command line to run the crawler.
If you need loging pass in the -v
flag.
spider -v --domain https://choosealicense.com crawl
Crawl and output all links visited to a file.
spider --domain https://choosealicense.com crawl -o > spider_choosealicense.json
spider_cli 1.31.7
madeindjs <contact@rousseau-alexandre.fr>, j-mendez <jeff@a11ywatch.com>
Multithreaded web crawler written in Rust.
USAGE:
spider [OPTIONS] --domain <DOMAIN> [SUBCOMMAND]
OPTIONS:
-b, --blacklist-url <BLACKLIST_URL>
Comma seperated string list of pages to not crawl or regex with feature enabled
-c, --concurrency <CONCURRENCY>
How many request can be run simultaneously
-d, --domain <DOMAIN>
Domain to crawl
-D, --delay <DELAY>
Polite crawling delay in milli seconds
-h, --help
Print help information
-r, --respect-robots-txt
Respect robots.txt file
-u, --user-agent <USER_AGENT>
User-Agent
-v, --verbose
Print page visited on standard output
-V, --version
Print version information
SUBCOMMANDS:
crawl crawl the website extracting links
help Print this message or the help of the given subcommand(s)
scrape scrape the website extracting html and links
All features are available except the Website struct on_link_find_callback
configuration option.
Dependencies
~16–27MB
~588K SLoC