7 releases
| 0.1.6 | Jan 30, 2026 |
|---|---|
| 0.1.5 | Jan 29, 2026 |
#240 in Simulation
Used in 2 crates
2MB
42K
SLoC
swarm-engine-eval
Scenario-based evaluation framework for SwarmEngine agent swarms.
Usage
Running from CLI (Recommended)
# From project root
cargo run --package swarm-engine-ui -- eval <SCENARIO_PATH>
# Example: Troubleshooting scenario
cargo run --package swarm-engine-ui -- eval crates/swarm-engine-eval/scenarios/troubleshooting.toml
# With options
cargo run --package swarm-engine-ui -- eval crates/swarm-engine-eval/scenarios/troubleshooting.toml \
-n 5 \ # Number of runs (default: 1)
-v \ # Verbose output (show tick snapshots)
--learning # Enable learning data collection
CLI Options
| Option | Description |
|---|---|
-n, --runs <N> |
Number of evaluation runs (default: 1) |
-s, --seed <SEED> |
Random seed (default: 42) |
-o, --output <FILE> |
JSON report output file |
-v, --verbose |
Verbose output with tick snapshots |
--learning |
Enable learning data collection |
--variant <NAME> |
Select scenario variant |
--list-variants |
List available variants |
Scenarios
Built-in Scenarios
Located in scenarios/ directory:
| Scenario | Description |
|---|---|
troubleshooting.toml |
Service diagnosis and recovery |
code_exploration.toml |
Codebase exploration |
search.toml |
Search tasks |
internal_diagnosis.toml |
Internal system diagnosis |
Scenario Format
[meta]
name = "Service Troubleshooting"
id = "user:troubleshooting:v2"
version = "2.0.0"
description = "Diagnose and fix a service outage"
tags = ["troubleshooting", "diagnosis", "ops"]
[task]
goal = "Diagnose the failing service and restart it"
expected = "Worker successfully restarts the problematic service"
[task.context]
target_service = "user-service"
worker_count = 1
[llm]
provider = "llama-server"
model = "LFM2.5-1.2B"
endpoint = "http://localhost:8080"
temperature = 0.1
timeout_ms = 30000
max_tokens = 512
[manager]
process_every_tick = false
process_interval_ticks = 5
immediate_on_escalation = true
confidence_threshold = 0.3
[[actions.actions]]
name = "CheckStatus"
description = "Check the status of services"
[[actions.actions.params]]
name = "service"
description = "Optional: specific service name to check"
required = false
[[actions.actions]]
name = "ReadLogs"
description = "Read logs for a specific service"
[[actions.actions]]
name = "Diagnose"
description = "Diagnose the root cause of issues"
[[actions.actions]]
name = "Restart"
description = "Restart a service"
category = "node_state_change"
[app_config]
tick_duration_ms = 10
max_ticks = 150
Scenario Variants
Scenarios can define variants for different configurations:
# List variants
cargo run --package swarm-engine-ui -- eval troubleshooting.toml --list-variants
# Run with variant
cargo run --package swarm-engine-ui -- eval troubleshooting.toml --variant complex
Environment Types
| Type | Description |
|---|---|
troubleshooting |
Service troubleshooting simulation |
codebase |
File operation environment (Read/Write/Grep/Glob) |
none |
Empty environment (for testing) |
Learning Integration
The eval system integrates with the offline learning system:
# 1. Collect learning data
cargo run --package swarm-engine-ui -- eval troubleshooting.toml -n 30 --learning
# 2. Run offline learning
cargo run --package swarm-engine-ui -- learn once troubleshooting
# 3. Next eval will use learned parameters
cargo run --package swarm-engine-ui -- eval troubleshooting.toml -n 5 -v
Assertions
Scenarios can define assertions for pass/fail criteria:
[[assertions]]
name = "minimum_success_rate"
metric = "success_rate"
op = "gte"
expected = 0.5
[[assertions]]
name = "max_ticks_limit"
metric = "total_ticks"
op = "lte"
expected = 100
Output
Eval produces:
- Console output with progress and results
- JSON report (with
-ooption) - Learning data (with
--learningoption) - Tick snapshots in verbose mode (with
-voption)
Dependencies
~19–41MB
~549K SLoC