27 releases
Uses new Rust 2024
new 0.15.3 | Jun 9, 2025 |
---|---|
0.15.0-alpha.3 | May 29, 2025 |
0.13.1 | Mar 31, 2023 |
0.10.0 | Oct 10, 2022 |
0.5.1 | Mar 15, 2022 |
#100 in Web programming
915 downloads per month
360KB
8K
SLoC
fetcher
fetcher is a flexible async framework designed to make it easy to create robust applications for building data pipelines to extract, transform, and deliver data from various sources to diverse destinations. In easier words, it makes it easy to create an app that periodically checks a source, for example a website, for some data, makes it pretty, and sends it to the users.
fetcher is made to be easily extensible to support as many use-cases as possible while providing tools to support most of the common ones out of the box.
Architecture
At the heart of fetcher is the Task
. It represents a specific instance of a data pipeline which consists of 2 main stages:
Source
: Fetches data from an external source (e.g. HTTP endpoint, email inbox).Action
: Applies transformations (filters, modifications, parsing) to the fetched data. The most notable action isSink
that sends the transformed data somewhere (e.g. Discord channel, Telegram chat, another program's stdin)
An Entry
is the unit of data flowing through the pipeline. It most notably contains:
id
: A unique identifier for the entry, used for tracking read/unread status and replies.raw_contents
: The raw, untransformed data fetched from the source.msg
: AMessage
that contains the formated and structured data, like title, body, link, that will end up sent to a sink.
A Job
is a collections of one or more tasks that are executed together, potentially on a schedule.
Jobs can also be run either concurrently or in parallel (depending on the "send" feature) as a part of a JobGroup
.
fetcher is extensible
Everything in fetcher is defined and used via traits, including but not limited to:
Jobs
, Tasks
,
Sources
, Actions
,
JobGroups
.
This allows you to define and use anything you might be missing in fetcher by default without having to modify any fetcher code whatsoever.
The easiest way to extend fetcher's parsing capabilities is to use transform_fn
that allows you to just pass in an async closure that modifies entries in whatever way you might want.
- Want to deserialize JSON into a struct with
serde
to get better error reporting and more flexibility than usingJson
? Easy-peasy, just usetransform_fn
to wrap an async closure in which you just calllet deserialized: Foo = serde_json::from_str(&entry.raw_contents)
and use it however you want. - Want to do a bunch of text manipulations and avoid a thousand
Replace's
&Extract's
?transform_fn
got your back, too. - Current selection of sinks is not enough? Define your own by implementing the
Sink
trait on your type. - Don't like default read-filtering strategies? Implement
MarkAsRead
andFilter
on your type. - Want to keep read state of entries in a database or just on the filesystem?
Implement
ExternalSave
yourself and do whatever you want.
If anything is not extensible, this is a bug and it should be reported.
Getting started
To use fetcher, you need to add it as a dependency to your Cargo.toml
file:
[dependencies]
fetcher = { version = "0.15", features = ["full"] }
tokio = { version = "1", features = ["full"] }
For the smallest example on how to use fetcher, please see examples/simple_website_to_stdout.rs
.
More complete examples can be found in the examples/
directory. They demonstrate how toj
- Fetch data from various sources.
- Transform and filter data using regular expressions, HTML parsing, JSON parsing.
- Implement custom sources, actions, sinks
- Persist the read filter state in an external storage system
Features
send
Use the (enabled by default) send
feature to enable tokio multithreading support.
If send
is disabled, then the Send + Sync
bounds are relaxed from most types
but job groups no longer run jobs in parallel, using tokio::task::spawn_local
instead of tokio::spawn
.
Please note that this requires you to wrap your calls to JobGroup::run
in a tokio::task::LocalSet
to work.
Please see tests/non_send.rs
for an example.
nightly
The nightly
feature enables some traits implementation for some Rust nightly-only types, like !
.
full and all-sources, all-actions, all-sinks, all-misc
Each source, action, and sink (which is also an action but different enough to warrant being separate), is gated behind a feature gate to help on the already pretty bad build times for apps using fetcher.
A feature is usually named using "(source|action|sink)-(name)" format.
Not only that, all sources, actions, and sinks (and misc features like google-oauth2
) are also grouped into "all-(sources|actions|sinks|misc)" features
to enable every source, action, sink, or misc respectively.
Every feature can be enabled with the feature full
.
This is the preffered way to use fetcher for the first time as it enables to use everything you might need before you actually know what you need.
Later on full
can be replaced with the actual features you use to get some easy compile time gains.
For example, an app fetching RSS feeds and sending them to a telegram channel might use features source-http
, action-feed
, and sink-telegram
.
Note
fetcher was completely rewritten in v0.15.0. It changed from an application with a config file to an application framework.
This was mostly done to make using fetcher correctly as easy and bug-free as possible.
Not to mention the huge config file was getting unwieldy and difficult to write and extend to your needs.
To make the config file more flexible would require integrating an actual programming language into it (like Lua).
I actually considered integrating Lua into the config file (a-la the Astral web framework) before I remembered that
we already have a properly integrated programming language, the one fetcher
has always been written in in the first place.
I decided to double down on the fact that fetcher
is written in Rust,
instead making fetcher
a highly-extensible easy-to-use generic automation and data pipelining framework
which can be used to build apps, including apps similar to what fetcher
has originally been.
Since then fetcher-core
and fetcher-config
crates are no longer used (or needed),
so if anybody needs these on crates.io, hit me up!
Contributing
Contributions are very welcome! Please feel free to submit a pull request or open issues for any bugs, feature requests, or general feedback.
License: MPL-2.0
Dependencies
~11–38MB
~694K SLoC