#telemetry #observability #analytics

micromegas-telemetry-sink

module for the publication of telemetry, part of micromegas

19 releases (4 breaking)

new 0.5.0 Mar 20, 2025
0.3.3 Feb 15, 2025
0.2.3 Dec 11, 2024
0.2.2 Nov 29, 2024
0.1.0 Mar 15, 2024

#38 in #telemetry

Download history 131/week @ 2024-11-27 31/week @ 2024-12-04 160/week @ 2024-12-11 5/week @ 2024-12-18 2/week @ 2025-01-08 126/week @ 2025-01-15 144/week @ 2025-02-05 367/week @ 2025-02-12 146/week @ 2025-02-19 35/week @ 2025-02-26 2/week @ 2025-03-05 7/week @ 2025-03-12

209 downloads per month
Used in 2 crates

Apache-2.0

235KB
5.5K SLoC

Micromegas - Scalable Observability

Crates.io Apache licensed Build Status

rust API documentation

python API

grafana plugin

design presentation

unreal observability

Objectives

  • Unified observability: logs, metrics and traces in the same database.

  • Spend less time reproducing problems

    • Collect enough data to understand how to correct the problems.

    • Quantify the frequency and severity of the issues instead of debugging the first one you can reproduce.

  • Achieve better quality: monitor & catch problems before they get noticed by users.

Design Strategies

Low overhead instrumentation

20 ns / event in the calling thread, one additional thread for the preparation and upload to the server.

High frequency of events

Up to 100000 events / second for a single instrumented process.

Scalability of ingestion service

Scalable backend can accept data from millions of concurrent instrumented processes.

Tail sampling & ETL on demand

In order to keep costs down, most payloads will remain unprocessed until they expire.

Query using SQL

Status

February 2025

  • Released version 0.4.0
  • Incremental data reduction using sql-defined views
  • System monitor thread
  • Added support for ARM (& macos)
  • Deleted analytics-srv and the custom http python client to connect to it

January 2025

  • Released version 0.3.0
  • New FlightSQL python API
    • Ready to replace analytics-srv with flight-sql-srv

Decembre 2024

Novembre 2024

Released version 0.2.1

  • FlightSQL support
  • Measures and log entries can now be tagged with properties
    • Not yet available in SQL queries

October 2024

Released version 0.2.0

Septembre 2024

Released version 0.1.9

  • Updating global views every second
  • Caching metadata (processes, streams & blocks) in the lakehouse & allow sql queries on them

August 2024

Released version 0.1.7

  • New global materialized views for logs & metrics of all processes
  • New daemon service to keep the views updated as data is ingested
  • New analytics API based on SQL powered by Apache Datafusion

July 2024

Released version 0.1.5

Unreal

  • Better reliability, retrying failed http requests
  • Spike detection

Maintenance

  • Delete old blocks, streams & processes using cron task

June 2024

Released version 0.1.4

Good enough for dogfooding :)

Unreal

  • Metrics publisher
  • FName scopes

Analytics

  • Metric queries
  • Convert cpu traces in perfetto format

May 2024

Released version 0.1.3

Better unreal engine instrumentation

  • new protocol
  • http request callbacks no longer binded to the main thread
  • custom authentication of requests

Analytics

  • query process metadata
  • query spans of a thread

April 2024

Telemetry ingestion from rust & unreal are working :)

Released version 0.1.1

Not actually useful yet, I need to bring back the analytics service to a working state.

January 2024

Starting anew. I'm extracting the tracing/telemetry/analytics code from https://github.com/legion-labs/legion to jumpstart the new project. If you are interested in collaborating, please reach out.


lib.rs:

Telemetry Grpc sink library

Provides logging, metrics, memory and performance profiling

Dependencies

~46–64MB
~1M SLoC