1 unstable release
Uses new Rust 2024
| 0.1.0 | Feb 11, 2026 |
|---|
#734 in Testing
2MB
44K
SLoC
APR Model QA Playbook
Property-Based Model Qualification Testing for HuggingFace Models
Philosophy • Features • Quick Start • Architecture • Test Matrix • MQS Scoring
Philosophy
This framework synthesizes two complementary quality paradigms:
Toyota Production System (TPS)
"Stop the line. Fix it now. Never pass a defect to the next process." — Taiichi Ohno
| Principle | Application |
|---|---|
| Jidoka | Execution halts on first P0 failure |
| Poka-Yoke | Schema validation prevents malformed playbooks |
| Genchi Genbutsu | All metrics from actual inference |
| Heijunka | Load-balanced parallel execution |
| Kaizen | Continuous refinement via mutation testing |
Popperian Falsificationism
"The criterion of the scientific status of a theory is its falsifiability." — Karl Popper
We don't test to pass—we test to fail. No amount of passing tests proves correctness, but a single failure proves a defect.
| Outcome | Meaning |
|---|---|
Corroborated |
Hypothesis survived refutation attempt |
Falsified |
Hypothesis refuted by evidence |
Timeout |
Execution exceeded time limit |
Crashed |
Process terminated abnormally |
Features
- Property-based testing via proptest for comprehensive scenario generation
- Parallel execution with Rayon worker pools
- Gateway checks (G0-G4) that zero the score on critical failures
- Model Qualification Score (MQS) 0-1000 with grade mapping
- JUnit XML and HTML reports for CI/CD integration
- Playbook YAML format with JSON Schema validation
- 1.8M+ test assertions across all model/format/backend combinations
- 217 falsification gates across conversion, inference, patterns, and security domains
New in v2.0.0
| Feature | Description |
|---|---|
| Two-Tier Certification | MVP (≤10min, Grade B) and Full (≤1hr, Grade A+) tiers |
| Tier-Aware Scoring | score_from_tier(), status_from_tier(), grade_from_tier() |
| Certify CLI Command | apr-qa certify --family qwen-coder --tier mvp |
| Rosetta Differential Testing | Tensor layout mismatch, token comparison, fingerprint, stats validation |
| Profile CI Mode | Performance assertions for CI/CD (--assert-throughput, --assert-p99) |
| Trace Payload Mode | Real forward pass with NaN/Inf and garbage output detection |
| Bug Pattern Detection | 12 cross-project patterns from aprender/realizar analysis |
Model Certifications
Certification Summary (updated: 2026-03-02 10:15 UTC)
| Status | Count |
|---|---|
| Certified | 95/95 |
| Provisional | 0/95 |
| Blocked | 0/95 |
| Pending | 0/95 |
Priority Family: Qwen Coder (see Certified Testing Spec)
| Model | Family | Size | Status | MQS | Grade | G1-4 | Prov | GGUF CPU | GGUF GPU | APR CPU | APR GPU | ST CPU | ST GPU |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| bloom-560m | bloom | 560M | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| bloomz-560m | bloom | 560M | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| deepseek-coder-1.3b-instruct | deepseek-coder | 1.3B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| deepseek-coder-6.7b-instruct | deepseek-coder | 6.7B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| deepseek-coder-7b-instruct | deepseek-coder | 7B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| deepseek-coder-33b-instruct | deepseek-coder | 33B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| DeepSeek-Coder-V2-Lite-Instruct | deepseek-coder-v2 | 16B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| DeepSeek-R1-Distill-Qwen-1.5B | deepseek-r1 | 1.5B | 1000 | A | ✗ | ✗ | - | - | - | - | - | - | |
| DeepSeek-R1-Distill-Qwen-7B | deepseek-r1 | 7B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| DeepSeek-R1-Distill-Llama-8B | deepseek-r1 | 8B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| DeepSeek-R1-Distill-Qwen-14B | deepseek-r1 | 14B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| DeepSeek-R1-Distill-Qwen-32B | deepseek-r1 | 32B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| DeepSeek-R1-Distill-Llama-70B | deepseek-r1 | 70B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| dolphin-2.6-mistral-7b | dolphin | 7B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Dolphin3.0-Llama3.1-8B | dolphin | 8B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| falcon-7b-instruct | falcon | 7B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| falcon-40b | falcon | 40B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Falcon-H1-Tiny-90M-Instruct | falcon-h1 | 90M | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Falcon-H1-0.5B-Instruct | falcon-h1 | 0.5B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| tiny_starcoder_py | gpt-bigcode | 164M | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| gpt-neo-125m | gpt-neo | 125M | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| pythia-410m-deduped | gpt-neox | 410M | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| pythia-160m | gpt-neox | 160M | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| pythia-70m | gpt-neox | 70M | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| distilgpt2 | gpt2 | 82M | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| gpt2 | gpt2 | 124M | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| gpt2-large | gpt2 | 774M | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| gpt2-medium | gpt2 | 355M | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| granite-3.1-2b-instruct | granite | 2B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| granite-3.1-8b-instruct | granite | 8B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| granite-3b-code-instruct-128k | granite-code | 3B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Hermes-3-Llama-3.1-8B | hermes | 8B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| internlm2_5-7b-chat | internlm | 7B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| internlm2_5-20b-chat | internlm | 20B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| mamba-130m-hf | mamba | 130M | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| mamba2-130m-hf | mamba2 | 130M | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Mistral-7B-Instruct-v0.3 | mistral | 7B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Mistral-Nemo-Instruct-2407 | mistral | 12B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Mistral-Small-24B-Instruct-2501 | mistral | 24B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Codestral-22B-v0.1 | mistral-code | 22B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Llama-3.1-Nemotron-Nano-4B-v1.1 | nemotron | 4B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Llama-3.1-Nemotron-70B-Instruct-HF | nemotron | 70B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| OLMo-2-1124-7B-Instruct | olmo | 7B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| OLMo-2-1124-13B-Instruct | olmo | 13B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| openchat-3.5-0106 | openchat | 7B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| OpenHermes-2.5-Mistral-7B | openhermes | 7B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| galactica-125m | opt | 125M | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| phi-1_5 | phi | 1.5B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Phi-3-mini-4k-instruct | phi | 3.8B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Phi-3.5-mini-instruct | phi | 3.8B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Phi-3-small-8k-instruct | phi | 7B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Phi-3-medium-4k-instruct | phi | 14B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Phi-4-mini-instruct | phi4 | 3.8B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Qwen2.5-0.5B-Instruct | qwen | 0.5B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Qwen2.5-1.5B-Instruct | qwen | 1.5B | 1000 | A | ✗ | ✗ | - | - | - | - | - | - | |
| Qwen2.5-3B-Instruct | qwen | 3B | 964 | A | ✗ | ✗ | - | - | - | - | - | - | |
| Qwen2.5-7B-Instruct | qwen | 7B | 900 | B | ✗ | ✗ | - | - | - | - | - | - | |
| Qwen2.5-14B-Instruct | qwen | 14B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Qwen2.5-32B-Instruct | qwen | 32B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| QwQ-32B | qwen | 32B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Qwen2.5-72B-Instruct | qwen | 72B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Qwen2.5-Coder-0.5B-Instruct | qwen-coder | 0.5B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Qwen2.5-Coder-1.5B-Instruct | qwen-coder | 1.5B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Qwen2.5-Coder-3B-Instruct | qwen-coder | 3B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Qwen2.5-Coder-7B-Instruct | qwen-coder | 7B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Qwen2.5-Coder-14B-Instruct | qwen-coder | 14B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Qwen2.5-Coder-32B-Instruct | qwen-coder | 32B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Qwen2-0.5B-Instruct | qwen2 | 0.5B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Qwen3-0.6B | qwen3 | 0.6B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Qwen3-1.7B | qwen3 | 1.7B | 964 | A | ✗ | ✗ | - | - | - | - | - | - | |
| Qwen3-4B | qwen3 | 4B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Qwen3-8B | qwen3 | 8B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Qwen3-14B | qwen3 | 14B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Qwen3-32B | qwen3 | 32B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Qwen3-Coder-30B-A3B-Instruct | qwen3-coder-moe | 30B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Qwen3-30B-A3B | qwen3-moe | 30B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Qwen3-Coder-Next | qwen3-next | 3B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| SmolLM2-135M-Instruct | smollm | 135M | 925 | B | ✗ | ✗ | - | - | - | - | - | - | |
| SmolLM2-360M-Instruct | smollm | 360M | 925 | B | ✗ | ✗ | - | - | - | - | - | - | |
| SmolLM-135M | smollm | 135M | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| SmolLM-360M | smollm | 360M | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| SmolLM2-1.7B-Instruct | smollm | 1.7B | 925 | B | ✗ | ✗ | - | - | - | - | - | - | |
| SmolLM2-135M | smollm2 | 135M | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| SmolLM2-360M | smollm2 | 360M | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| stablelm-2-zephyr-1_6b | stablelm | 1.6B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| stablelm-zephyr-3b | stablelm | 3B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| starcoder2-3b | starcoder2 | 3B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| starcoder2-7b | starcoder2 | 7B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| starcoder2-15b | starcoder2 | 15B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| TinyLlama-1.1B-Chat-v1.0 | tinyllama | 1.1B | 1000 | A | ✗ | ✗ | - | - | - | - | - | - | |
| WizardCoder-33B-V1.1 | wizardcoder | 33B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Yi-1.5-6B-Chat | yi | 6B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Yi-1.5-9B-Chat | yi | 9B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| Yi-1.5-34B-Chat | yi | 34B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - | |
| zephyr-7b-beta | zephyr | 7B | 1000 | A+ | ✓ | ✗ | - | - | - | - | - | - |
Quick Start
# Build all crates
make build
# Run all tests
make test
# Generate coverage report
make coverage
# Certify models (recommended)
cargo run --bin apr-qa -- certify --family qwen-coder --tier mvp
# Run a specific playbook
cargo run --bin apr-qa -- run playbooks/models/qwen2.5-coder-1.5b-mvp.playbook.yaml
Certification Tiers
| Tier | Time | Description | Pass → Grade / Status |
|---|---|---|---|
| Dim-Smoke | <30s | Dimension-only via kernel equivalence (SafeTensors, CPU) | Kernel-proven dev check |
| Smoke | ~1-2 min | Sanity check (minimal matrix) | Dev feedback only |
| MVP | ~5-10 min | All formats × backends × modalities (18 combos) | ≥90% → B / PROVISIONAL |
| Quick | ~10-30 min | Dev iteration with broader coverage | Dev feedback |
| Standard | ~1-2 hr | CI/CD gate | CI gate |
| Deep | ~8-24 hr | Production qualification (full matrix) | ≥95% → A+ / CERTIFIED |
# Dimensional smoke (fastest — requires kernel proof via MVP on representative model)
cargo run --bin apr-qa -- certify --kernel-class A --tier dim-smoke
# Smoke check
cargo run --bin apr-qa -- certify --family qwen-coder --tier smoke
# MVP certification (quick surface coverage)
cargo run --bin apr-qa -- certify --family qwen-coder --tier mvp
# Deep certification (production qualification)
cargo run --bin apr-qa -- certify --family qwen-coder --tier deep
Architecture
┌─────────────────────────────────────────────────────────────────────────────┐
│ APR-MODEL-QA-PLAYBOOK │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ apr-qa-gen │ │ apr-qa-runner│ │apr-qa-report │ │
│ │ │───▶│ │───▶│ │ │
│ │ • proptest │ │ • parallel │ │ • MQS score │ │
│ │ • scenarios │ │ • execution │ │ • JUnit XML │ │
│ │ • oracles │ │ • evidence │ │ • HTML/MD │ │
│ │ • kernels │ │ │ │ │ │
│ │ • bootstrap │ │ │ │ │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │ │
│ └────────────────────┼────────────────────┘ │
│ ▼ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │apr-qa-certify│ │ apr-qa-cli │ │
│ │ │◀───│ │ │
│ │ • tier score │ │ • certify │ │
│ │ • README sync│ │ • run/report │ │
│ │ • CSV export │ │ • Jidoka sigs│ │
│ └──────────────┘ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Crate Structure
| Crate | Purpose |
|---|---|
apr-qa-gen |
Scenario generation with proptest, oracle definitions, kernel profiles, playbook bootstrapping |
apr-qa-runner |
Playbook execution, differential testing, bug patterns |
apr-qa-report |
MQS scoring, JUnit/HTML report generation |
apr-qa-certify |
Two-tier certification, README sync, tier-aware scoring |
apr-qa-cli |
Command-line interface |
Key Modules (apr-qa-runner)
| Module | Purpose |
|---|---|
executor.rs |
Scenario execution engine |
parallel.rs |
Rayon-based parallel execution with Jidoka enforcement |
playbook.rs |
YAML playbook parsing and validation |
conversion.rs |
Format conversion testing with bug classification |
differential.rs |
Rosetta diff-tensors, compare-inference, profile CI |
patterns.rs |
Cross-project bug pattern detection (12 patterns) |
contract.rs |
Generic contract validation |
family_contract.rs |
Family YAML alignment checks |
layout_contract.rs |
LAYOUT-002 row-major tensor validation |
integrity.rs |
config.json and model integrity (G0 gateway) |
provenance.rs |
Git/file provenance tracking |
evidence.rs |
Evidence collection and serialization |
oracle.rs |
Oracle execution layer |
command.rs |
Process execution wrapper |
diagnostics.rs |
Debugging and diagnostic output |
process.rs |
Jidoka process lifecycle management |
Test Matrix
The framework tests models across multiple dimensions:
| Dimension | Options |
|---|---|
| Modality | run, chat, serve |
| Backend | cpu, gpu |
| Format | safetensors (ground truth), apr, gguf |
| Quantization | q4_k_m, q5_k_m, q8_0, f16, f32 |
Ground Truth: SafeTensors is the source of truth for model weights (native HuggingFace format). APR is our optimized native format. GGUF is a supported third-party format.
With 100 scenarios per combination across 100 HuggingFace models:
- 3 modalities × 2 backends × 3 formats × 100 models × 100 scenarios = 1,800,000 tests
MQS Scoring
The Model Qualification Score (MQS) ranges from 0-1000:
Gateway Checks (G0-G4)
Any gateway failure zeros the entire score:
| Gateway | Check | Failure Impact |
|---|---|---|
| G0 | config.json matches tensor metadata | MQS = 0 |
| G1 | Model loads successfully | MQS = 0 |
| G2 | Basic inference works | MQS = 0 |
| G3 | No crashes or panics | MQS = 0 |
| G4 | Output is not garbage | MQS = 0 |
Tier-Aware Scoring
The scoring system uses tier-aware functions:
| Tier | Pass Threshold | Score on Pass | Grade | Status |
|---|---|---|---|---|
| MVP | ≥90% | 800 | B | PROVISIONAL |
| Full | ≥95% | 950+ | A+ | CERTIFIED |
Grade Mapping
| Score | Grade | Status |
|---|---|---|
| 950-1000 | A+ | CERTIFIED |
| 900-949 | A | CERTIFIED |
| 850-899 | B+ | CERTIFIED |
| 800-849 | B | PROVISIONAL |
| 700-799 | C | PROVISIONAL |
| 0-699 | F | BLOCKED |
Playbook Format
version: "1.0"
model:
id: "Qwen/Qwen2.5-Coder-1.5B"
revision: "main"
test_matrix:
modalities: [run, chat]
backends: [cpu, gpu]
formats: [safetensors, apr, gguf] # safetensors is ground truth
scenarios:
- name: "arithmetic_basic"
prompt: "What is 2 + 2?"
oracle: arithmetic
expected: 4
- name: "code_generation"
prompt: "Write a Python function to reverse a string"
oracle: code_syntax
language: python
# Differential Testing (v1.3.0)
differential_tests:
tensor_diff:
enabled: true
filter: "embed,lm_head"
gates: ["F-ROSETTA-DIFF-001"]
inference_compare:
enabled: true
prompt: "What is 2+2?"
tolerance: 1e-5
# Profile CI Assertions (v1.3.0)
profile_ci:
enabled: true
assertions:
min_throughput: 10.0 # tok/s
max_p99_ms: 500 # ms
# Trace Payload (v1.3.0)
trace_payload:
enabled: true
gates: ["F-TRACE-PAYLOAD-001", "F-TRACE-PAYLOAD-002"]
Project Structure
apr-model-qa-playbook/
├── crates/
│ ├── apr-qa-gen/ # Scenario generation + oracles + kernel profiles + bootstrapper
│ ├── apr-qa-runner/ # Playbook execution (Rayon parallel, 16 modules)
│ ├── apr-qa-report/ # MQS scoring + JUnit/HTML/Markdown reports
│ ├── apr-qa-certify/ # Tier-aware scoring, README sync, CSV export
│ └── apr-qa-cli/ # CLI binary (14 subcommands)
├── certifications/ # Model certification evidence (39 models)
│ └── <model>/evidence.json
├── playbooks/
│ ├── models/ # Per-model playbooks (117 YAML files)
│ ├── templates/ # Reusable templates (smoke, mvp, quick, standard, deep)
│ ├── verify/ # Ticket verification
│ └── spec/ # Executable specifications
├── book/ # mdBook documentation
├── scripts/ # Validation and golden output generation
└── docs/
├── certifications/ # models.csv certification database (95 models)
├── specifications/ # Full specification (10 docs)
├── tickets/ # Ticket analysis (GH-190, GH-191)
├── five-whys/ # Root cause analysis
├── workflows/ # Certification workflow guides
└── troubleshooting/ # Debugging guides
Installation
Install the CLI from source:
cargo install --path crates/apr-qa-runner
Or build the entire workspace:
cargo build --release --workspace
Usage
Run model qualification against a playbook:
# Run a single model playbook
apr-qa run playbooks/models/qwen-coder-0.5b.yaml
# Certify a model family (MVP tier, ≤10 min)
apr-qa certify --family qwen-coder --tier mvp
# Generate HTML report
apr-qa report --format html --output report.html
See apr-qa --help for the full list of commands and options.
Contributing
Contributions are welcome. Please follow these steps:
- Fork the repository
- Make changes on your fork
- Run
make check(fmt + lint + test) before submitting - Open a pull request with a clear description of the change
All pull requests must pass CI quality gates (clippy, tests, coverage ≥ 95%).
Development
# Run tests with coverage
make coverage
# Verify PMAT compliance (>= 95%)
make coverage-check
# Lint with clippy
make lint
# Full check (fmt + lint + test)
make check
License
MIT License - see LICENSE for details.
Built with Rust • Powered by proptest • Inspired by Toyota & Popper
Dependencies
~18–34MB
~401K SLoC