1 unstable release
0.0.5 | Feb 20, 2024 |
---|
#40 in #rss
140KB
4.5K
SLoC
- RSS Funnel
The RSS Funnel is a modular RSS processing pipeline. It is designed to be used to modify existing RSS source in various interesting ways such as:
- Fetch full content
- Generate a RSS feed from an HTML page
- Remove unwanted elements from the article (using a CSS selector)
- Keep or remove articles matching keywords or patterns
- Highlight keywords in articles
- Redact or replace text in the article (using a regular expression)
- Split a single RSS article into multiple articles
- Merge multiple feeds into a single feed
- Run arbitrary JS code to transform the feed or articles (with [[https://github.com/shouya/rss-funnel/wiki/JS-DOM-API][DOM API support]])
[[https://rss-funnel-demo.fly.dev/][Try out the live demo!]]
** Installation
You can use the docker image ([[https://github.com/shouya/rss-funnel/pkgs/container/rss-funnel][latest version]]) in your =docker-compose.yaml=:
#+begin_src yaml version: "3.8" services: rss-funnel: image: ghcr.io/shouya/rss-funnel:latest ports: - 4080:4080 volumes: - ./funnel.yaml:/funnel.yaml command: /rss-funnel -c /funnel.yaml server -b 0.0.0.0:4080 #+end_src
Alternatively, you can build it directly from source:
#+begin_src bash git clone https://github.com/shouya/rss-funnel.git cd rss-funnel
first build the front-end assets
cd inspector && npm i && npm run build && cd ..
then build the binary
cargo build --release #+end_src
Or if you prefer not to build from source, you can download the pre-built artifacts from [[https://github.com/shouya/rss-funnel/releases][release page]].
** Usage
To use =rss-funnel=, you need to supply a configuration file in YAML. Here is an example configuration.
#+begin_src yaml endpoints:
-
path: /tokio-blog.xml note: Full text of Tokio blog source: https://tokio.rs/_next/static/feed.xml filters:
- full_text: {}
- simplify_html: {}
-
path: /solidot.xml note: Solidot news with links source: https://www.solidot.org/index.rss filters:
- full_text: {}
- keep_element: .p_mainnew
- simplify_html: {}
- sanitize:
- replace_regex: from: "(?http(s)?://[^< \n]*)" to: '$link'
-
path: /hackernews.xml note: Full text of Hacker News source: https://news.ycombinator.com/rss filters:
- full_text: simplify: true append_mode: true #+end_src
Save above file to =/path/to/funnel.yaml= and run the following command:
#+begin_src rss-funnel -c /path/to/funnel.yaml server #+end_src
You can optionally specify the bind address and port (default =127.0.0.1:4080=). Detailed usage can be found in =--help= output.
The endpoints like =http://127.0.0.1:4080/tokio-blog.xml= should be serving the filtered feeds.
** Endpoint
Each of the configuration contains a number of endpoints. Each endpoint correspond to a RSS feed.
Properties:
- =path= (required): The path of the endpoint. The path should start with =/=.
- =note= (optional): A note for the endpoint. Only used for display purpose.
- =source= (optional): The source url of the RSS feed.
- If not specified, you must specify =?source== query in the request. This allows for usages like applying same filters for different feeds.
- If the source points to a HTML page, =rss-funnel= will try to generate a RSS feed from the page with a single article. You can then use =split= filter to split the single article into multiple articles. See [[https://github.com/shouya/rss-funnel/wiki/Cookbook#hacker-news-top-links][Cookbook: Hacker News Top Links]] for an example.
- =filters= (required): A list of filters to apply to the feed.
- The feed from the =source= goes through the filters in the order specified. You can think of each filter as corresponding to a transformation on the =Feed=.
- Each filter is specified as an YAML object with the singleton key being the name of the filter and the value being the configuration of the filter.
- For example, in the filter definition: =- keep_element: .p_mainnew=
- the filter's name is =keep_element=
- the configuration is the string value =.p_mainnew=. Depending on the filter, the configuration can have different types.
- For example, in the filter definition: =- keep_element: .p_mainnew=
- The =Feed= object from the last filter is returned as the response.
- =client= (optional): The configuration for the HTTP client used to fetch the source like the user_agent. See [[https://github.com/shouya/rss-funnel/wiki/Client-config][Client config]] for detail.
** Filters
See [[https://github.com/shouya/rss-funnel/wiki/Filters][Filters]] for the documentations for all available filters.
** Cookbook
See [[https://github.com/shouya/rss-funnel/wiki/Cookbook][Cookbook]] for some examples of using =rss-funnel=.
Dependencies
~36–50MB
~1M SLoC