357 stable releases

new 1.93.9 Apr 25, 2024
1.89.5 Mar 30, 2024
1.80.37 Dec 31, 2023
1.50.17 Nov 30, 2023
1.26.7 Mar 22, 2023

#3 in #crawler

Download history 171/week @ 2024-01-01 117/week @ 2024-01-08 61/week @ 2024-01-15 60/week @ 2024-01-22 40/week @ 2024-01-29 2098/week @ 2024-02-19 1049/week @ 2024-02-26 292/week @ 2024-03-04 535/week @ 2024-03-11 1825/week @ 2024-03-18 2674/week @ 2024-03-25 1268/week @ 2024-04-01 900/week @ 2024-04-08 471/week @ 2024-04-15

5,543 downloads per month

MIT license

545KB
10K SLoC

Spider Worker

crate version

A spider worker to decentralize the crawl lifting.

Dependencies

This project depends on the spider crate.

Usage

The worker starts on port 3030 and the scraper for html gathering on 3031 by default.

SPIDER_WORKER_PORT=3030 SPIDER_WORKER_SCRAPER_PORT=3031 cargo run

Feature Flags

  1. scrape - When the html is needed run the instance with the flag. Requires spider feature flag matching on the client to start. This also starts the instance on port 3031 instead.
  2. full_resources - Start the basic worker to gather links and scraper together.
  3. tls - Enable tls support use the env variables SPIDER_WORKER_CERT_PATH for the .pem file and SPIDER_WORKER_KEY_PATH with your .rsa file. Defaults to /cert.pem and /key.rsa.

Ports

By default the instance runs on port 3030 use SPIDER_WORKER_PORT to adjust the port. The scraper runs on port 3031 when enabled use SPIDER_WORKER_SCRAPER_PORT to adjust the port.

Dependencies

~15–32MB
~534K SLoC