#open-telemetry #observability #parquet #otlp #telemetry

bin+lib otlp2parquet

Stream OpenTelemetry logs, metrics, and traces to Parquet files

10 releases (5 breaking)

0.9.0 Jan 19, 2026
0.8.2 Jan 19, 2026
0.7.1 Jan 12, 2026
0.6.0 Jan 9, 2026
0.2.2 Nov 29, 2025

#892 in Database interfaces

Apache-2.0

190KB
4.5K SLoC

Rust 3K SLoC // 0.0% comments Python 1.5K SLoC // 0.1% comments Shell 146 SLoC // 0.2% comments

otlp2parquet

CI Crates.io License

What if your observability data was just a bunch of Parquet files?

Receive OpenTelemetry logs, metrics, and traces and write them as Parquet files to local disk or S3-compatible storage. Query with duckdb, Spark, pandas, or anything that reads Parquet.

If you want to stream real-time observability data directly to AWS, Azure or Cloudflare: check out the related otlp2pipeline project.

flowchart TB
    subgraph Sources["OpenTelemetry Sources"]
        Traces
        Metrics
        Logs
    end

    subgraph otlp2parquet["otlp2parquet"]
        Decode["Decode"] --> Arrow["Arrow"] --> Write["Parquet"]
    end

    subgraph Storage["Storage"]
        Local["Local File"]
        S3["S3-Compatible"]
    end

    Query["Query Engines"]

    Sources --> otlp2parquet
    otlp2parquet --> Storage
    Query --> Storage

Quick Start

# requires rust toolchain: `curl https://sh.rustup.rs -sSf | sh`
cargo install otlp2parquet

otlp2parquet

Server starts on http://localhost:4318. Send a simple OTLP HTTP log:

# otlp2parquet batches writes to disk every BATCH_AGE_MAX_SECONDS by default
curl -X POST http://localhost:4318/v1/logs \
  -H "Content-Type: application/json" \
  -d '{"resourceLogs":[{"scopeLogs":[{"logRecords":[{"body":{"stringValue":"hello world"}}]}]}]}'

Query it:

# see https://duckdb.org/install
duckdb -c "SELECT * FROM './data/logs/**/*.parquet'"

Print configuration to receive OTLP from a collector, Claude Code, or Codex:

otlp2parquet connect otel-collector
otlp2parquet connect claude-Code
otlp2parquet connect codex

Why?

  • Keep monitoring data around a long time Parquet on S3 can be 90% cheaper than large monitoring vendors for long-term analytics.
  • Query with good tools — duckDB, Spark, Trino, Pandas
  • Deploy anywhere — Local binary, containers, or your own servers.

Run with Docker

docker-compose up

Supported Signals

Logs, Metrics, Traces via OTLP/HTTP (protobuf or JSON, gzip compression supported). No gRPC support for now.

APIs, schemas, and partition layout

  • OTLP/HTTP endpoints: /v1/logs, /v1/metrics, /v1/traces (protobuf or JSON; gzip supported)
  • Partition layout: logs/{service}/year=.../hour=.../{ts}-{uuid}.parquet, metrics/{type}/{service}/..., traces/{service}/...
  • Storage: filesystem or S3-compatible object storage
  • Schemas: ClickHouse-compatible, PascalCase columns; five metric schemas (Gauge, Sum, Histogram, ExponentialHistogram, Summary)
  • Error model: HTTP 400 on invalid input/too large; 5xx on conversion/storage

Future work (contributions welcome)

  • OpenTelemetry Arrow alignment
  • Additional platforms: Azure Functions; Kubernetes manifests

Learn More


Caveats
  • Batching: Use an OTel Collector upstream to batch and reduce request overhead.
  • Schema: Uses ClickHouse-compatible column names. Will converge with OTel Arrow (OTAP) when it stabilizes.
  • Status: Functional but evolving. API may change.

Dependencies

~64–86MB
~1.5M SLoC