#aws #metrics

bin+lib cloudwatch_metrics_agent

An agent for sending custom CPU and memory metrics to Cloudwatch

6 releases

0.1.8 Dec 25, 2023
0.1.7 Nov 11, 2023
0.1.6 May 3, 2022
0.1.5 Feb 22, 2022

#904 in Command line utilities

49 downloads per month

MIT license

35KB
713 lines

Cloudwatch Metrics Agent

Crates.io Build

cloudwatch_metrics_agent is a simple agent for sending custom CPU and memory metrics to Cloudwatch.

Installation

cargo install cloudwatch_metrics_agent

Usage

Simple

To launch agent and send metrics to CloudWatch each minute:

cloudwatch_metrics_agent --namespace TestNamespace --service FooService --period 60

To preview metrics without actual sending them to CloudWatch:

RUST_LOG=info cloudwatch_metrics_agent --namespace TestNamespace --service FooService --period 60 --dry-run

Agent in sidecar container

To deploy agent in ECS with Fargate or EC2 just add a container with the agent to a task definition with a monitored service. Agent's container shares resources and namespace with other containers in a task definition so collected metrics are valid.

Agent inside Docker container

If compute environment does not allow multi-container configurations (such as AWS Batch) it is possible to launch agent in background while keeping main process in foreground. To properly handle termination a supervisor or a init multiplexer like tini or dumb-init must be used so the backround agent process receive SIGTERM signal and flushes remaining metrics at shutdown. Because entrypoint shell script is used as a entrypoint, the whole process group must receive a signal. To achieve this the tini must be used a with a process group -g flag while the dumb-init does it by default. The entrypoint script must also send metrics on succesfull foreground process termination when no signals sent.

Example entrypoint script:

#!/bin/bash

# launch agent in the background and save its pid
cloudwatch_metrics_agent --namespace TestNamespace --service FooService &
child=$!

# launch main process in the foreground
"$@"
exitcode=$?

echo "Terminating agent after foreground process exit" >&2
kill -TERM "$child"
wait "$child"
echo "Exiting with $exitcode" >&2
exit $exitcode

Example ENTRYPOINT directive for Dockerfile:

ENTRYPOINT ["tini", "-g", "--"]

or

ENTRYPOINT ["dumb-init"]

How does it work

The agent collects system metrics (CPU and memory utilization) and periodically sends them to a publisher. Publisher can be a AWS CloudWatch service or console (for debugging).

Period (--period parameter) specifies how often logs are emitted to publisher, one minute by default. Metrics are collected during this period with much higher resolution (0.9 seconds) and aggregated by median and max values. So it relatively safe to set a publishing period to five minutes (non-detailed CloudWatch metrics default).

If the cloud service is unavailable while sending a bunch of metrics after two retries, this bunch is skipped. When the agent is stopped (SIGTERM or SIGINT signal received), all remaining metrics are flushed to the publisher.

Emitted metrics:

  • CPUUtilization - median CPU utilization across all CPU cores, in percents.

  • MemoryUtilization - median memory utilization, in percents. Calculated as used memory divided by total memory in percents where used memory is total memory without free, buffers, page cache and slabs.

Metrics are published to the specified namespace and service name. AWS credentials are configured via the environment variables, config files or IAM role.

Logs are configured via env_logger so the RUST_LOG environment variable controls logging with errors only by default. To show only emitted metrics while omitting other logs set the environment variable RUST_LOG="cloudwatch_metrics_agent::publisher=info" or RUST_LOG="cloudwatch_metrics_agent::cloudwatch=info".

The agent uses Tokio framework for asynchronous work, AWS SDK for Rust for communicating with AWS and sysinfo for gathering metrics.

Portability

Works on Linux and macOS.

To build a static binary for usage in container use a static CRT linkage:

cargo build --release --target x86_64-unknown-linux-musl

License

BSD 3-Clause

Dependencies

~20–30MB
~422K SLoC