4 releases
new 0.2.2 | Dec 6, 2024 |
---|---|
0.2.1 | Dec 6, 2024 |
0.1.1 | Dec 4, 2024 |
0.1.0 | Dec 4, 2024 |
#113 in HTTP client
189 downloads per month
52KB
990 lines
SkyStreamer
SkyStreamer is an AT Firehose consumer that streams new posts from Bluesky. It provides simplified wrapper data types and a streaming API for easy filtering of events on the AT Firehose, based on ATrium.
SkyStreamer is part of Project Zenith, an experimental social media analytics project by Fyra Labs.
Why?
The AT protocol firehose can be a very useful source of data, one may want to consume it in a more structured way. SkyStreamer helps filter out the noise and only stream new Bluesky posts from the firehose.
Planned Features
- Make data types non-dependent on SurrealDB while maintaining data types
- Export a crate for easy integration with other projects
- SSL-independent implementation (Rustls/OpenSSL agnostic)
- Optimized, multiple-threaded implementation
- Configuration
Ethics
SkyStreamer collects large amounts of new data streaming from Bluesky itself, which then can be used for many various purposes.
While the network and the data itself is visibly public to everyone, Some users may be uncomfortable with their data being collected and used in this way, especially for machine learning or similar purposes.
[!IMPORTANT] Please be mindful of the data you are collecting and how you are using it, and respect their consent if possible.
SkyStreamer comes with a streaming API that can be used alongside Rust's iterator API, which can be used to filter out data. However, it does not filter or redact any data by default, you will be responsible for your own data processing.
Fyra Labs is not responsible for any misuse of the data collected by SkyStreamer. You are responsible for your own actions.
Usage
SkyStreamer can be used as a library or as a CLI. The library provides a Tokio stream that can be used to stream data from the AT Firehose.
As a CLI
By default, SkyStreamer will stream its data into a SurrealDB database. You may change this using environment variables or CLI arguments.
You may also stream the data into an IETF RFC 4180 CSV-formatted file, or log it into the console.
skystreamer -E csv -o data.csv
See skystreamer --help
for more information.
As a library
Please see the skystreamer/examples
directory for examples on how to use SkyStreamer as a library.
Prometheus Exporter
SkyStreamer also has a Prometheus exporter implementation that can be used to show statistics about Bluesky posts as a whole.
See skystreamer-prometheus-exporter for the implentation.
You can also use the Docker image from ghcr.io/fyralabs/skystreamer-prometheus-exporter:latest
to run the exporter. Configure it like you would
normally configure Prometheus data sources.
The data itself only includes numbers of posts and various counters per metric, and does not include any actual post data except for external link domains and hashtags.
Environment Variables
MAX_SAMPLE_SIZE
: Maximum number of posts to count before overflow, default is none (Maxu64
).NORMALIZE_LANGS
: Attempt to normalize post language codes to their respective IETF BCP 47 codes, default istrue
. Set tofalse
to export raw codes from the AT firehose.
Older implementation
SkyStreamer was originally implemented as a simple Python script. You can find the old implementation in the legacy
directory.
License
SkyStreamer is licensed under the MIT License. See the LICENSE
file for more information.
This software is provided as-is, without any warranty or guarantee of any kind. Use at your own risk. Fyra Labs is not responsible for any misuse or damage caused by this software.
Dependencies
~14–27MB
~385K SLoC