4 releases
| new 0.2.2 | Dec 4, 2025 |
|---|---|
| 0.2.1 | Nov 21, 2025 |
| 0.2.0 | Nov 19, 2025 |
| 0.1.0 | Oct 16, 2025 |
#100 in Audio
51KB
1K
SLoC
Armchair
Armchair is a load test binary that can be used to benchmark Rime's TTS service with concurrent requests.
Primary use cases:
- To find the time-to-first-byte (TTFB) and real-time factor (RTF) for a given concurrency level.
- To find the maximum concurrency that satisfy the given performance targets.
- To find an optimal client-side buffer size to avoid underrun issues.
For audio streaming with concurrent sessions, TTFB and RTF are the key performance indicators. To achieve real-time streaming, it is imperative that RTF is under 1 and a maximum concurrency is typically imposed to ensure that RTF is under 1.
Supported features:
- Bisection to find maximum concurrency based on configurable performance target (success, TTFB, RTF)
- Time-to-first byte and RTF metrics for a given concurrency
- Session start staggering via exponential distribution
- Intra-session delays via truncated normal distribution with playback-aware waiting
- Client-side buffer simulation and underrun detection
Methodology
This tool simulates many concurrent streaming sessions and evaluates performance against a configurable target.
-
Session model:
- At a given concurrency C, C sessions are launched.
- Session starts are staggered by an exponential inter-arrival process with rate λ (
--session-rate, starts/second). - Each session performs
-n/--requests-per-sessionsequential requests.
-
Per-request timing and metrics:
- TTFB is measured from request send to the first received byte.
- Elapsed is the total time to stream the entire response.
- Audio duration is parsed from the WAV headers; if parsing fails the request is treated as non-audio for RTF purposes.
- RTF is computed as (elapsed − TTFB) / audio_duration. RTF values requiring missing/invalid audio duration are excluded from RTF percentiles.
-
Intra-session delay model (traffic shaping):
- After each request completes, the tool waits any remaining playback time if the audio was synthesized faster than real time, i.e. max(0, audio_duration − (elapsed − TTFB)).
- Then it sleeps an additional delay sampled from a Normal distribution with parameters
--intra-session-delay-muand--intra-session-delay-sigma, truncated to [--intra-session-delay-min,--intra-session-delay-max]. - The first request in a session has no intra-session delay; session start staggering is controlled by the exponential process above.
-
Buffer underrun detection:
- Simulates a client-side buffer of size
--client-buffer(default 0ms). - Playback starts once the buffer is full.
- An underrun occurs if the buffer empties before playback completes.
- Requires valid WAV headers to determine the byte rate.
- Simulates a client-side buffer of size
-
Aggregation and statistics:
- Success is counted when HTTP status is 2xx and the body is non-empty.
- Percentiles (p50/p90/p95/p99) are computed via linear interpolation over sorted samples; NaN/invalid values are excluded from the relevant metric’s distribution.
- A startup config dump prints all key parameters for reproducibility.
-
Performance target evaluation (
--target):- The target is a conjunction: all configured clauses must pass.
- Supported clauses:
success:<fraction>,ttfb:pXX@<duration>,rtf:pXX@<value>,underrun:<fraction>. - If a metric cannot be computed (e.g., no valid audio for RTF), that clause fails.
- Results show OK/FAIL per clause, with color when the terminal supports it.
-
Maximum concurrency search (when
--concurrency 0):- Exponential growth: repeatedly doubles concurrency (1, 2, 4, …) until the performance target fails; waits 10s between trials.
- Binary search: bisection between last known-good and first failing to find the largest concurrency that still satisfies the target.
- After discovery, a final run at the chosen concurrency prints a full summary.
Note: The traffic and delay processes are stochastic; repeated runs will vary. Randomness is seeded from system entropy.
Usage
Installation
cargo install armchair
Maximum concurrency
To find the maximum concurrency where each session sends 5 requests with:
- session starts following an exponential process (lambda=5 starts/sec)
- intra-session delays sampled from a truncated normal N(mu=10s, sigma=5s), clamped to [0s, 20s]
- performance targets
success:1.00,ttfb:p99@1s,rtf:p99@1.00,underrun:0.00(default)
armchair --url '<RIME_SERVICE>' --token '<RIME_API_KEY>'
The tool should then report metrics like:
=== MAXIMUM CONCURRENCY FOUND: 16 ===
...
----- Summary -----
total: 80 success: 80 (100.0%)
Buffer underrun: 0 (0.0%)
TTFB ms: mean=104.4 p50=100.8 p90=117.6 p95=126.2 p99=141.0
Elapsed ms: mean=13924.0 p50=13772.5 p90=16412.8 p95=17065.3 p99=18527.3
RTF: mean=1.067 p50=1.061 p90=1.170 p95=1.208 p99=1.254
Fixed concurrency
By specifying the flag --concurrency, the tool skips the bisection and simply
produces the latency metrics.
Request customization
-n: Number of requests in each session, e.g.5--session-rate: Session starts per second following a Poisson distribution for staggered starts, e.g.5--intra-session-delay-mu: Intra-session delay mean, e.g.10s--intra-session-delay-sigma: Intra-session delay standard deviation, e.g.5s--intra-session-delay-min: Intra-session delay minimum clamp, e.g.0s--intra-session-delay-max: Intra-session delay maximum clamp, e.g.20s--client-buffer: Client-side initial playback buffer, e.g.100ms--target: Performance target specification, e.g.success:1.00,ttfb:p90@500ms,rtf:p90@1.00,underrun:0.00--percentiles: List of percentiles to report, e.g.1,25,50,90,99
Duration value syntax
Flags that accept durations (e.g., --intra-session-delay-mu) take values with units:
500ms, 1.5s, 10s
Performance target flag
--target accepts a comma-separated list:
success:<fraction>,ttfb:pXX@<duration>,rtf:pXX@<value>,underrun:<fraction>
Examples:
--target success:0.99,ttfb:p95@800ms,rtf:p90@1.20,underrun:0.01
--target success:1.00,ttfb:p90@1s,rtf:p90@1.00,underrun:0.00
Dependencies
~17–35MB
~506K SLoC