27 releases

Uses new Rust 2024

new 0.15.3 Jun 9, 2025
0.15.0-alpha.3 May 29, 2025
0.13.1 Mar 31, 2023
0.10.0 Oct 10, 2022
0.5.1 Mar 15, 2022

#100 in Web programming

Download history 1/week @ 2025-04-16 5/week @ 2025-04-23 8/week @ 2025-05-07 98/week @ 2025-05-21 152/week @ 2025-05-28 659/week @ 2025-06-04

915 downloads per month

MPL-2.0 license

360KB
8K SLoC

fetcher

fetcher is a flexible async framework designed to make it easy to create robust applications for building data pipelines to extract, transform, and deliver data from various sources to diverse destinations. In easier words, it makes it easy to create an app that periodically checks a source, for example a website, for some data, makes it pretty, and sends it to the users.

fetcher is made to be easily extensible to support as many use-cases as possible while providing tools to support most of the common ones out of the box.

Architecture

At the heart of fetcher is the Task. It represents a specific instance of a data pipeline which consists of 2 main stages:

  • Source: Fetches data from an external source (e.g. HTTP endpoint, email inbox).
  • Action: Applies transformations (filters, modifications, parsing) to the fetched data. The most notable action is Sink that sends the transformed data somewhere (e.g. Discord channel, Telegram chat, another program's stdin)

An Entry is the unit of data flowing through the pipeline. It most notably contains:

  • id: A unique identifier for the entry, used for tracking read/unread status and replies.
  • raw_contents: The raw, untransformed data fetched from the source.
  • msg: A Message that contains the formated and structured data, like title, body, link, that will end up sent to a sink.

A Job is a collections of one or more tasks that are executed together, potentially on a schedule. Jobs can also be run either concurrently or in parallel (depending on the "send" feature) as a part of a JobGroup.

fetcher is extensible

Everything in fetcher is defined and used via traits, including but not limited to: Jobs, Tasks, Sources, Actions, JobGroups.

This allows you to define and use anything you might be missing in fetcher by default without having to modify any fetcher code whatsoever.

The easiest way to extend fetcher's parsing capabilities is to use transform_fn that allows you to just pass in an async closure that modifies entries in whatever way you might want.

  • Want to deserialize JSON into a struct with serde to get better error reporting and more flexibility than using Json? Easy-peasy, just use transform_fn to wrap an async closure in which you just call let deserialized: Foo = serde_json::from_str(&entry.raw_contents) and use it however you want.
  • Want to do a bunch of text manipulations and avoid a thousand Replace's & Extract's? transform_fn got your back, too.
  • Current selection of sinks is not enough? Define your own by implementing the Sink trait on your type.
  • Don't like default read-filtering strategies? Implement MarkAsRead and Filter on your type.
  • Want to keep read state of entries in a database or just on the filesystem? Implement ExternalSave yourself and do whatever you want.

If anything is not extensible, this is a bug and it should be reported.

Getting started

To use fetcher, you need to add it as a dependency to your Cargo.toml file:

[dependencies]
fetcher = { version = "0.15", features = ["full"] }
tokio = { version = "1", features = ["full"] }

For the smallest example on how to use fetcher, please see examples/simple_website_to_stdout.rs. More complete examples can be found in the examples/ directory. They demonstrate how toj

  • Fetch data from various sources.
  • Transform and filter data using regular expressions, HTML parsing, JSON parsing.
  • Implement custom sources, actions, sinks
  • Persist the read filter state in an external storage system

Features

send

Use the (enabled by default) send feature to enable tokio multithreading support.

If send is disabled, then the Send + Sync bounds are relaxed from most types but job groups no longer run jobs in parallel, using tokio::task::spawn_local instead of tokio::spawn. Please note that this requires you to wrap your calls to JobGroup::run in a tokio::task::LocalSet to work. Please see tests/non_send.rs for an example.

nightly

The nightly feature enables some traits implementation for some Rust nightly-only types, like !.

full and all-sources, all-actions, all-sinks, all-misc

Each source, action, and sink (which is also an action but different enough to warrant being separate), is gated behind a feature gate to help on the already pretty bad build times for apps using fetcher.

A feature is usually named using "(source|action|sink)-(name)" format. Not only that, all sources, actions, and sinks (and misc features like google-oauth2) are also grouped into "all-(sources|actions|sinks|misc)" features to enable every source, action, sink, or misc respectively.

Every feature can be enabled with the feature full. This is the preffered way to use fetcher for the first time as it enables to use everything you might need before you actually know what you need. Later on full can be replaced with the actual features you use to get some easy compile time gains.

For example, an app fetching RSS feeds and sending them to a telegram channel might use features source-http, action-feed, and sink-telegram.

Note

fetcher was completely rewritten in v0.15.0. It changed from an application with a config file to an application framework.

This was mostly done to make using fetcher correctly as easy and bug-free as possible. Not to mention the huge config file was getting unwieldy and difficult to write and extend to your needs. To make the config file more flexible would require integrating an actual programming language into it (like Lua). I actually considered integrating Lua into the config file (a-la the Astral web framework) before I remembered that we already have a properly integrated programming language, the one fetcher has always been written in in the first place.

I decided to double down on the fact that fetcher is written in Rust, instead making fetcher a highly-extensible easy-to-use generic automation and data pipelining framework which can be used to build apps, including apps similar to what fetcher has originally been.

Since then fetcher-core and fetcher-config crates are no longer used (or needed), so if anybody needs these on crates.io, hit me up!

Contributing

Contributions are very welcome! Please feel free to submit a pull request or open issues for any bugs, feature requests, or general feedback.

License: MPL-2.0

Dependencies

~11–38MB
~694K SLoC