9 releases
0.4.3 | Nov 9, 2024 |
---|---|
0.4.1 | Oct 14, 2024 |
0.3.5 | Aug 18, 2024 |
0.3.3 | Nov 27, 2023 |
0.1.7 | May 18, 2023 |
#708 in Asynchronous
94KB
2K
SLoC
capp-rs
Common things i use to build Rust CLI tools for web crawlers.
lib.rs
:
CAPP - "Comprehensive Asynchronous Parallel Processing" or just "Crawler APP"
capp
is a Rust library designed to provide powerful and flexible tools for building efficient web crawlers and other asynchronous, parallel processing applications. It offers a robust framework for managing concurrent tasks, handling network requests, and processing large amounts of data in a scalable manner.
Features
- Asynchronous Task Management: Utilize tokio-based asynchronous processing for efficient, non-blocking execution of tasks.
- Flexible Task Queue: Implement various backend storage options for task queues, including in-memory and Redis-based solutions.
- Round-Robin Task Distribution: Ensure fair distribution of tasks across different domains or categories.
- Configurable Workers: Set up and manage multiple worker instances to process tasks concurrently.
- Error Handling and Retry Mechanisms: Robust error handling with configurable retry policies for failed tasks.
- Dead Letter Queue (DLQ): Automatically move problematic tasks to a separate queue for later analysis or reprocessing.
- Health Checks: Built-in health check functionality to ensure the stability of your crawling or processing system.
- Extensible Architecture: Easily extend the library with custom task types, processing logic, and storage backends.
Use Cases
While capp
is primarily designed for building web crawlers, its architecture makes it suitable for a variety of parallel processing tasks, including:
- Web scraping and data extraction
- Distributed task processing
- Batch job management
- Asynchronous API clients
- Large-scale data processing pipelines
Getting Started
To use capp
in your project, add it to your Cargo.toml
:
[dependencies]
capp = "0.4"
Check examples!
Modules
config
: Configuration management for your application.healthcheck
: Functions for performing health checks on your system.http
: Utilities for making HTTP requests and handling responses.manager
: Task and worker management structures.queue
: Task queue implementations and traits.task
: Definitions and utilities for working with tasks.
Dependencies
~10–24MB
~344K SLoC