13 releases
| new 0.3.11 | Mar 9, 2026 |
|---|---|
| 0.3.10 | Mar 8, 2026 |
| 0.3.9 | Feb 21, 2026 |
| 0.2.0 | Feb 20, 2026 |
| 0.1.0 | Jan 30, 2026 |
#165 in Algorithms
1,231 downloads per month
420KB
8K
SLoC
muxer
Deterministic, multi-objective routing for piecewise-stationary multi-armed bandit problems with small action sets.
Problem setting
Given K arms (model versions, inference endpoints, backends -- any discrete action set selected repeatedly), an agent observes vector-valued outcomes per call and must choose the next arm. Each outcome carries binary quality signals (ok, junk, hard_junk), an optional continuous quality score in [0, 1], a cost proxy, and wall-clock latency. Reward distributions are piecewise-stationary: they may change at unknown times, and the agent must detect and adapt.
The setting is a multi-objective extension of the stochastic MAB [1]: the agent simultaneously minimizes cost, latency, and degradation rate while maximizing success rate. These objectives cannot in general be collapsed to a single scalar without losing information. The core selection algorithm:
- Maintains a sliding window of recent
Outcomes per arm [6]. - Computes a Pareto frontier over the objective vector using standard Pareto dominance [7].
- Selects deterministically from the frontier via linear weighted scalarization with configurable weights and a stable lexicographic tie-break.
Standard single-objective approaches fail in characteristic ways: pure exploitation prevents detection of regressions on non-selected arms; uniform allocation wastes budget on known-degraded arms; threshold-based cooldown misses gradual distribution drift. An effective policy must:
- explore arms with high uncertainty or recent distributional changes
- detect regressions by maintaining sufficient sampling rate on all arms
- remain deterministic by default (identical statistics and configuration produce identical selections)
Outcome fields
ok: the call produced a usable resultjunk: quality fell below the caller's threshold;hard_junk=trueimpliesjunk=truehard_junk: complete failure (error, timeout, parse failure) -- strict subset of junk, penalized separatelyquality_score: Option<f64>: continuous quality signal in[0, 1]. WhenMabConfig::quality_weight > 0, incorporated as a separate Pareto dimension.cost_units: caller-defined cost proxyelapsed_ms: wall-clock time
Designed for 2--10 arms and moderate window sizes (hundreds to low thousands of observations).
Capabilities
| Category | Primitive | Description |
|---|---|---|
| Selection | select_mab / select_mab_explain |
Deterministic Pareto + scalarization over sliding-window summaries |
ThompsonSampling |
Seedable Thompson sampling [2] for scalar rewards in [0, 1]; optional decay |
|
Exp3Ix |
Seedable EXP3-IX [3] for adversarial / non-stationary rewards | |
LinUcb (feature contextual) |
Linear contextual bandit [4] with per-request feature vectors | |
BanditPolicy trait (feature stochastic) |
Common decide/update_reward interface over ThompsonSampling and Exp3Ix |
|
softmax_map |
Score-to-probability conversion for traffic splitting | |
| Monitoring | TriageSession |
Per-arm CUSUM [5] detection with per-(arm, context-bin) cell investigation |
CusumCatBank |
Bank of CUSUM alternatives at multiple reference levels (GLR-inspired [8] robustification) | |
calibrate_cusum_threshold |
Monte Carlo: find threshold h such that P[alarm within m rounds] <= alpha |
|
| Triage | worst_first_pick_k |
Post-detection: prioritize the most degraded arms for investigation |
ContextualCoverageTracker |
Lift triage to (arm, context-bin) pairs for localized regressions |
|
| Coverage | CoverageConfig |
Maintenance sampling floor: keep all arms measured above a quota |
ControlConfig / pick_control_arms |
Reserve deterministic-random picks as a selection-bias anchor | |
| Guardrails | LatencyGuardrailConfig |
Hard pre-filter by mean latency with stop-early semantics |
Multi-pick (select_mab_k_guardrailed_*) |
Select up to k unique arms per decision with per-round guardrails |
|
| Orchestration | Router |
Stateful session: owns per-arm state, full select/observe/triage lifecycle, dynamic arm add/remove |
suggested_window_cap |
SW-UCB-derived [6] window sizing: sqrt(throughput / change_rate) |
|
PipelineOrder / PolicyPlan |
Harness glue for composing custom routing pipelines | |
| Utilities | Decision envelope |
Unified decision record (chosen + probs + typed audit notes) for logging/replay |
novelty_pick_unseen / apply_prior_counts_to_summary |
Novelty helpers and prior smoothing |
Which policy to use
select_mab: Multiple objectives with deterministic selection.O(sqrt(K T ln T))pseudo-regret on the scalarized objective under stationarity [1].ThompsonSampling: Single scalar reward in[0, 1].O(sqrt(K T ln T))Bayes regret [2]. Seedable; optional decay.Exp3Ix: Adversarial rewards.O(sqrt(K T ln K))expected regret [3]. Seedable; optional decay.LinUcb(featurecontextual): Per-request feature vector.O(d sqrt(T ln T))regret under linear realizability [4].BanditPolicytrait (featurestochastic): Genericfn run<P: BanditPolicy>(p: &mut P)over both stochastic policies.
Routing lifecycle
- Normal (
select_mab/ThompsonSampling/Exp3Ix): select the best arm while exploring. - Regression investigation (
worst_first_pick_k): after CUSUM [5] fires, route additional traffic to characterize the change.TriageSessionautomates the handoff. - Control (
pick_control_arms): uniform-random baseline to anchor quality estimates and detect selection bias.
CoverageConfig bridges 1 and 3: minimum per-arm sampling rate prevents starvation that would delay regression detection.
Three goals for sampling
Every routing decision serves three competing objectives:
- Exploitation -- minimize cumulative regret.
- Estimation -- reduce uncertainty about each arm's current performance.
- Detection -- minimize delay between a distributional change and the alarm.
The two clocks. An arm is only observed when selected:
- Wall time
t: global decision steps. - Sample time
n_k: observations for armk.
Detection delay in wall time: delay_wall ~ delay_samples / rate_k. CoverageConfig sets a floor on rate_k.
The non-contextual collapse. Under fixed allocation, MSE and average CUSUM detection delay are both O(1/n_k). Their sensitivity functions are scalar multiples. The three-way Pareto surface collapses to a 1-D curve parameterized by n_k:
R_T * D_avg = C * Delta * T / delta^2
You cannot independently improve both regret and detection delay without increasing the total sample budget.
The contextual revival. With LinUcb, average detection delay stays proportional to estimation error. But worst-case detection delay -- concentrated on the covariate cell with fewest observations -- is genuinely independent. This is why ContextualCoverageTracker and TriageSession exist: localized regressions in sparse covariate regions need cell-level triage.
See the API docs and examples/EXPERIMENTS.md for derivations and failure modes.
Usage
[dependencies]
muxer = "0.3.9"
Deterministic core only (no stochastic bandits):
[dependencies]
muxer = { version = "0.3.9", default-features = false }
Quickstart
use muxer::{Router, RouterConfig, Outcome};
let arms = vec!["backend-a".to_string(), "backend-b".to_string()];
let mut router = Router::new(arms, RouterConfig::default()).unwrap();
loop {
let d = router.select(1, 0);
let arm = d.primary().unwrap().to_string();
let outcome = Outcome { ok: true, junk: false, hard_junk: false,
cost_units: 5, elapsed_ms: 120, quality_score: None };
router.observe(&arm, outcome);
if router.mode().is_triage() {
router.acknowledge_change(&arm);
}
}
For larger arm counts, pass k > 1 to batch exploration:
let cfg = RouterConfig::default().with_coverage(0.02, 1);
let d = router.select(3, 0); // K=30, k=3 -> coverage in ~10 rounds
Examples
Deterministic multi-objective selection
use muxer::{select_mab, MabConfig, Summary};
use std::collections::BTreeMap;
let arms = vec!["a".to_string(), "b".to_string()];
let mut summaries = BTreeMap::new();
summaries.insert("a".to_string(), Summary { calls: 10, ok: 9, junk: 0, hard_junk: 0, cost_units: 10, elapsed_ms_sum: 900, mean_quality_score: None });
summaries.insert("b".to_string(), Summary { calls: 10, ok: 9, junk: 2, hard_junk: 0, cost_units: 10, elapsed_ms_sum: 900, mean_quality_score: None });
let sel = select_mab(&arms, &summaries, MabConfig::default());
assert_eq!(sel.chosen, "a"); // lower junk rate wins
Detect-then-triage
use muxer::{TriageSession, TriageSessionConfig, OutcomeIdx};
let arms = vec!["a".to_string(), "b".to_string()];
let mut session = TriageSession::new(&arms, TriageSessionConfig::default()).unwrap();
session.observe("a", OutcomeIdx::OK, &[0.2, 0.3]);
session.observe("b", OutcomeIdx::HARD_JUNK, &[0.8, 0.9]);
let alarmed = session.alarmed_arms();
let bins = session.tracker().active_bins();
let cells = session.top_alarmed_cells(&bins, 3);
Runnable examples
Getting started
getting_started.rs -- Minimal 3-backend routing loop in 60 lines. Shows the core lifecycle: select an arm, observe an outcome with delayed quality labeling, let the router adapt. Start here.
router_quickstart.rs -- Full routing lifecycle: two arms diverge in quality, CUSUM fires a triage alarm, the operator acknowledges and the router recovers. Also demonstrates large-K batch exploration.
router_production.rs -- Production configuration. CUSUM threshold calibration, monitoring windows, coverage enforcement, guardrails, control arms, and snapshot/restore for process restarts.
cargo run --example getting_started
cargo run --example router_quickstart
cargo run --example router_production --features stochastic
Algorithm variants
deterministic_router.rs -- Multi-objective MAB with hard constraints on junk rate and cost/latency weights. Deterministic tie-breaking for reproducible selections.
thompson_router.rs -- Thompson sampling with decay for non-stationary binary rewards. Two arms with drifting success probabilities.
exp3ix_router.rs -- Exp3-IX for adversarial settings where reward distributions change over time.
contextual_router.rs -- LinUCB with 2D context features (prompt length, difficulty). Arm performance depends on input characteristics.
sticky_mab_router.rs -- Dwell-time and margin-based switching. Prevents backend thrashing when arm estimates are noisy.
monitored_router.rs -- Drift detection via Hellinger distance between baseline and recent observation windows. Maintenance sampling prevents lock-on.
cargo run --example deterministic_router
cargo run --example thompson_router
cargo run --example exp3ix_router
cargo run --example contextual_router --features contextual
cargo run --example sticky_mab_router
cargo run --example monitored_router --features stochastic
cargo run --example end_to_end_router --features stochastic
Configuration patterns
end_to_end_router.rs -- Delayed quality labeling: push outcomes with unknown junk status, update later after validation. Combines stickiness and constraints.
mab_constraints_tuning.rs -- Constraints-first design. Hard-filter arms exceeding 10% junk, then tune cost/latency weights among the clean set.
coverage_autotune.rs -- Derive the coverage sampling rate from KL divergence and CUSUM theory to guarantee regression detection within a wall-time budget.
contextual_propensity_logging.rs -- Extract per-arm selection probabilities from LinUCB decisions for offline inverse-propensity-weighted analysis.
guardrail_semantics.rs -- Soft (novelty before guardrail) vs hard (guardrail-first) policy ordering. Shows how the order of filtering stages affects which arms get trialed.
decision_unified.rs -- The unified Decision type with stickiness. Shows how dwell-time prevents thrashing when the base policy flaps between arms.
window_delayed_junk_label.rs -- Minimal pattern for deferred quality labeling: push an outcome with junk=false, then update to true after downstream validation.
cargo run --example mab_constraints_tuning
cargo run --example coverage_autotune
cargo run --example contextual_propensity_logging --features contextual
cargo run --example guardrail_semantics
cargo run --example decision_unified
cargo run --example window_delayed_junk_label
Detection and calibration
detector_calibration.rs -- Monte Carlo threshold calibration: find the CUSUM threshold h such that P[false alarm within m rounds] <= alpha.
detector_inertia.rs -- Cumulative vs forgetful detectors. CatKL accumulates history and loses sensitivity over time; CUSUM forgets and stays responsive. Shows the effect of pre-change baseline length on detection delay.
bqcd_sampling.rs -- Partial-observation multi-armed detection. K arms are only observed when sampled, so the policy must balance exploration (find the changed arm), exploitation (focus on it), and coverage (don't starve others).
bqcd_calibrated.rs -- Global false-alarm calibration across arms with a CUSUM bank of alternative hypotheses.
cargo run --example detector_calibration
cargo run --example detector_inertia
cargo run --example bqcd_sampling
cargo run --example bqcd_calibrated
Domain harnesses
Each harness simulates a realistic routing scenario with multi-dimensional slicing (task x dataset x environment), per-slice arm eligibility, and injected drift at a known epoch.
matrix_harness.rs -- NLP evaluation: 5 task-dataset pairs (NER, relation extraction, coreference). A BERT-ONNX model drifts on social-media NER at epoch 48.
pcap_triage_harness.rs -- Network security: route packet-analysis backends (Suricata, Zeek, ML) per protocol/dataset. TLS evasion causes drift.
ad_auction_harness.rs -- Ad ranking: route auction models per objective/geo/device. A privacy policy change degrades the deep model on EU mobile traffic.
fraud_scoring_harness.rs -- Payments fraud: route fraud scorers per channel/region/segment. A rules model drifts on EU card-not-present after a fraud pattern shift.
medical_triage_harness.rs -- Clinical risk scoring: route triage models by setting/acuity/cohort. A protocol change breaks high-acuity pediatric ED scoring.
search_ranking_harness.rs -- Search ranking: route rankers by intent/locale/device. BM25 regresses on informational Spanish queries after a corpus shift.
synthetic_drift_harness.rs -- Controlled experiment with known drift injection point. Validates that the router detects and adapts correctly.
cargo run --example matrix_harness
cargo run --example pcap_triage_harness
cargo run --example ad_auction_harness
cargo run --example fraud_scoring_harness
cargo run --example medical_triage_harness
cargo run --example search_ranking_harness
cargo run --example synthetic_drift_harness
Investigation
free_lunch_investigation.rs -- When does change detection require coverage? Compares Thompson sampling, deterministic MAB, and MAB+coverage when one arm drifts. Shows that passive monitoring fails if the changed arm gets starved of observations.
cargo run --example free_lunch_investigation
Mini-experiments (trade-offs and failure modes): examples/EXPERIMENTS.md.
Documentation
Development
cargo test -p muxer
cargo bench -p muxer --bench coverage
cargo bench -p muxer --bench monitor
cargo fmt -p muxer --check
cargo clippy -p muxer --all-targets -- -D warnings
References
- P. Auer, N. Cesa-Bianchi, and P. Fischer. "Finite-time analysis of the multiarmed bandit problem." Machine Learning, 47(2-3):235--256, 2002.
- S. Agrawal and N. Goyal. "Analysis of Thompson Sampling for the Multi-armed Bandit Problem." COLT, 2012.
- P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. "The Nonstochastic Multiarmed Bandit Problem." SIAM J. Comput., 32(1):48--77, 2002.
- W. Chu, L. Li, L. Reyzin, and R. Schapire. "Contextual Bandits with Linear Payoff Functions." AISTATS, 2011.
- E. S. Page. "Continuous inspection schemes." Biometrika, 41(1-2):100--115, 1954.
- A. Garivier and E. Moulines. "On Upper-Confidence Bound Policies for Switching Bandit Problems." ALT, 2011.
- R. R. Drugan and A. Nowe. "Designing multi-objective multi-armed bandits algorithms: A study." IJCNN, 2013.
- L. Besson, E. Kaufmann, O.-A. Maillard, and J. Seznec. "Efficient Change-Point Detection for Tackling Piecewise-Stationary Bandits." arXiv:1902.01575, 2019.
License
Licensed under MIT or Apache-2.0 (LICENSE-MIT, LICENSE-APACHE).
Dependencies
~0.3–1MB
~22K SLoC