4 releases (2 breaking)

0.3.0	Apr 3, 2025
0.2.0	Sep 5, 2024
0.1.2	Jan 29, 2024
0.1.1	Jan 10, 2024

#383 in Debugging

2,135 downloads per month

Apache-2.0

66KB
1.5K SLoC

statsdproxy

A proxy for transforming, pre-aggregating and routing statsd metrics, like Veneur, Vector or Brubeck.

Currently supports the following transformations:

Deny- or allow-listing of specific tag keys or metric names
Adding hardcoded tags to all metrics
Basic cardinality limiting, tracking the number of distinct tag values per key or the number of overall timeseries (=combinations of metrics and tags).

See example.yml for details.

A major goal is minimal overhead and no loss of information due to unnecessarily strict parsing. Statsdproxy intends to orient itself around dogstatsd protocol but should gracefully degrade for other statsd dialects, in that those metrics and otherwise unparseable bytes will be forwarded as-is.

This is not a Sentry product, not deployed in any sort of production environment, but a side-project done during Hackweek.

Basic usage

Run a "statsd server" on port 8081 that just prints metrics
```
socat -u UDP-RECVFROM:8081,fork SYSTEM:"cat; echo"
```
Copy example.yaml to config.yaml and edit it
Run statsdproxy to read metrics from port 8080, transform them using the middleware in config.yaml and forward the new metrics to port 8081:
```
cargo run --release -- --listen 127.0.0.1:8080 --upstream 127.0.0.1:8081 -c config.yaml
```

Send metrics to statsdproxy:

yes 'users.online:1|c|@0.5' | nc -u 127.0.0.1 8080

You should see new metrics in socat with your middlewares applied.

Usage with Snuba

Patch the following settings in snuba/settings/__init__.py:

DOGSTATSD_HOST = "127.0.0.1"
DOGSTATSD_PORT = "8080"

This will send metrics to port 8080.

Processing model

This is the processing model used by the provided server. It should be respected by any usage of this software as a library.

The server receives metrics as bytes over udp, either singly or several joined with \n.
For every metric received, the server invokes the poll method of the topmost middleware.
- The middleware may use this invocation to do any needed internal bookkeeping.
- The middleware should then invoke the poll method of the next middleware, if any.
Once poll returns, the server invokes the submit method of the topmost middleware with a mutable reference to the current metric.
- The middleware should process the metric.
  - If processing was successful, and if appropriate to its function (eg. a metric aggregator might hold onto metrics), the middleware should submit the processed metric to the next middleware, returning the result of this call.
  - If processing was unsuccessful (eg. unknown StatsD dialect), the unchanged metric should be treated as the processed metric, and passed on or held as above.
  - If a middleware becomes unable to handle more metrics during processing, such that it cannot handle the current metric, it should return Overloaded.
- If an overload is indicated, the server shall pause (TODO: how long) before calling submit again with the same metric. (If an overload is indicated too many times, maybe drop the metric?)
Separately, if no metric is received by the server for 1 second, it will invoke the poll method of the topmost middleware. This invocation of poll should be handled the same as above.

Dependencies

~0.5–1.4MB
~25K SLoC