1 unstable release
0.1.0 | Dec 22, 2023 |
---|
#17 in #sitemap
Used in sws-lua
30KB
781 lines
Sitemap Web Scraper
Sitemap Web Scraper (sws) is a tool for simple, flexible, and yet performant web pages scraping.
It consists of a CLI written in Rust that crawls web pages and executes a Lua JIT script to scrap them, outputting results to a CSV file.
sws crawl --script examples/fandom_mmh7.lua -o result.csv
Check out the doc for more details.
lib.rs
:
Web crawler with plugable scraping logic.
The main function crawl_site
crawls and scraps web pages. It is
configured through a CrawlerConfig
and a Scrapable
implementation. The latter defines the Seed
used for crawling, as well as
the scraping logic. Note that robots.txt seeds are supported and
exposed through texting_robots::Robot in the
CrawlingContext
and ScrapingContext
.
Dependencies
~11–27MB
~385K SLoC