19 releases (4 breaking)

new 0.5.0 Mar 20, 2025
0.3.3 Feb 15, 2025
0.2.3 Dec 11, 2024
0.2.2 Nov 29, 2024
0.1.0 Mar 15, 2024

#20 in #low-overhead

Download history 162/week @ 2024-11-27 43/week @ 2024-12-04 178/week @ 2024-12-11 12/week @ 2024-12-18 1/week @ 2024-12-25 8/week @ 2025-01-08 135/week @ 2025-01-15 150/week @ 2025-02-05 279/week @ 2025-02-12 156/week @ 2025-02-19 32/week @ 2025-02-26 17/week @ 2025-03-12

213 downloads per month
Used in 6 crates (4 directly)

Apache-2.0

31KB
800 lines

Micromegas - Scalable Observability

Crates.io Apache licensed Build Status

rust API documentation

python API

grafana plugin

design presentation

unreal observability

Objectives

  • Unified observability: logs, metrics and traces in the same database.

  • Spend less time reproducing problems

    • Collect enough data to understand how to correct the problems.

    • Quantify the frequency and severity of the issues instead of debugging the first one you can reproduce.

  • Achieve better quality: monitor & catch problems before they get noticed by users.

Design Strategies

Low overhead instrumentation

20 ns / event in the calling thread, one additional thread for the preparation and upload to the server.

High frequency of events

Up to 100000 events / second for a single instrumented process.

Scalability of ingestion service

Scalable backend can accept data from millions of concurrent instrumented processes.

Tail sampling & ETL on demand

In order to keep costs down, most payloads will remain unprocessed until they expire.

Query using SQL

Status

February 2025

  • Released version 0.4.0
  • Incremental data reduction using sql-defined views
  • System monitor thread
  • Added support for ARM (& macos)
  • Deleted analytics-srv and the custom http python client to connect to it

January 2025

  • Released version 0.3.0
  • New FlightSQL python API
    • Ready to replace analytics-srv with flight-sql-srv

Decembre 2024

Novembre 2024

Released version 0.2.1

  • FlightSQL support
  • Measures and log entries can now be tagged with properties
    • Not yet available in SQL queries

October 2024

Released version 0.2.0

Septembre 2024

Released version 0.1.9

  • Updating global views every second
  • Caching metadata (processes, streams & blocks) in the lakehouse & allow sql queries on them

August 2024

Released version 0.1.7

  • New global materialized views for logs & metrics of all processes
  • New daemon service to keep the views updated as data is ingested
  • New analytics API based on SQL powered by Apache Datafusion

July 2024

Released version 0.1.5

Unreal

  • Better reliability, retrying failed http requests
  • Spike detection

Maintenance

  • Delete old blocks, streams & processes using cron task

June 2024

Released version 0.1.4

Good enough for dogfooding :)

Unreal

  • Metrics publisher
  • FName scopes

Analytics

  • Metric queries
  • Convert cpu traces in perfetto format

May 2024

Released version 0.1.3

Better unreal engine instrumentation

  • new protocol
  • http request callbacks no longer binded to the main thread
  • custom authentication of requests

Analytics

  • query process metadata
  • query spans of a thread

April 2024

Telemetry ingestion from rust & unreal are working :)

Released version 0.1.1

Not actually useful yet, I need to bring back the analytics service to a working state.

January 2024

Starting anew. I'm extracting the tracing/telemetry/analytics code from https://github.com/legion-labs/legion to jumpstart the new project. If you are interested in collaborating, please reach out.


lib.rs:

transit library provides fast binary serialization for Plain Old Data structures

Dependencies

~2.5MB
~51K SLoC