#product-os #web-crawler #browser #headless #automation #url #ecosystem

product-os-crawler

Product OS : Crawler is a browser based cralwer that utilises Product OS : Browser to perform advanced url crawling leveraging headless browsing and automation

6 releases

0.0.6 Sep 1, 2023
0.0.5 Aug 28, 2023

#1154 in Web programming

AGPL-3.0-only

95KB
1K SLoC

Product OS : Crawler

Product OS : Crawler is a browser based cralwer that utilises Product OS : Browser to perform advanced url crawling leveraging headless browsing and automation.

What is Product OS?

Product OS is a collection of packages that provide different tools and features that can work together to build products more easily for the Rust ecosystem.

Installation

Use the Rust crate package manager cargo to install Product OS : Crawler.

cargo add product-os-crawler

or add Product OS : Crawler to your cargo.toml [packages] section.

product-os-crawler = { version = "0.0.6", features = [], default-features = true, optional = false }

Features

Product OS Crawler supports a number of features leveraging existing Rust libraries to crawl and perform instructions including:

  • Basic crawling capabilities with back-off rules
  • Revisit logic and full configuration for tuning
  • Scoring system fully configurable to determine value of content
  • Ability to hand-off to indexing and storage services using a processor
// Feature samples TODO

Usage

// Examples TODO

Contributing

Contributions are not currently available but will be available on a public repository soon.

License

GNU AGPLv3

Dependencies

~40–55MB
~1.5M SLoC