16 releases (5 breaking)
Uses new Rust 2024
| new 0.6.2 | Apr 17, 2026 |
|---|---|
| 0.6.0 | Apr 9, 2026 |
| 0.5.0 | Mar 31, 2026 |
| 0.4.3 | Mar 25, 2026 |
| 0.1.0 | Jan 19, 2026 |
#1466 in Machine learning
Used in 3 crates
3.5MB
76K
SLoC
axonml-vision
Overview
axonml-vision provides the computer-vision stack for AxonML: image-specific transforms, loaders for classical vision datasets (MNIST, Fashion-MNIST, CIFAR-10/100) plus synthetic variants, and a wide catalog of neural-network architectures covering classification, detection, dense prediction, anomaly detection, VQA, 3D reconstruction, and biometrics. A pretrained-weights hub with on-disk caching rounds it out.
Features
- Image transforms —
Resize,CenterCrop,RandomHorizontalFlip,RandomVerticalFlip,RandomRotation,ColorJitter,Grayscale,ImageNormalize(presets:imagenet,mnist,cifar10),Pad,ToTensorImage. - Datasets — real-file loaders for
MNIST,FashionMNIST,CIFAR10,CIFAR100, and synthetic variantsSyntheticMNIST/SyntheticCIFARfor fast tests. - Classification —
LeNet,MLP,SimpleCNN,ResNet(resnet18,resnet34,BasicBlock,Bottleneck),VGG(vgg11,vgg13,vgg16,vgg19with optional batch-norm),VisionTransformer(vit_base,vit_large). - Detection —
BlazeFace(dual-scale 128×128 face detector, 896 anchors),RetinaFace(ResNet34 backbone + multi-level FPN head),DETR(transformer-based,smallpreset),NanoDet(mobile-class detector),Helios(YOLO-family detector with 5 sizes Nano/Small/Medium/Large/XLarge and loss utilitiesHeliosLoss,CIoULoss,TaskAlignedAssigner). - Novel detection architectures —
Nexus(predictive dual-pathway detector with multi-scale fusion, object-memory bank, and predictive-coding surprise gating) andPhantom(temporal event-driven face detection with pseudo-event encoder and GRU-based face-state tracker).NightVision(multi-domain infrared detector with thermal stem, CSP backbone, thermal FPN, YOLOX-style decoupled heads,ThermalDomaindomain tagging). - Dense prediction —
DPT(depth transformer,small/basepresets) andFastDepth(mobile depth estimator). - Anomaly detection —
PatchCoreandStudentTeacher, both withdefault_rgb()constructors. - Visual Question Answering —
VQAModel(smallpreset). - 3D reconstruction —
Aegis3D: Fourier-feature SDF networks (LocalSDF+GlobalSDF), adaptive octree spatial indexing, differentiable sphere-tracing renderer, and marching-cubes mesh extraction. - FPN infrastructure — shared
FPN(feature pyramid network) used by multiple detectors. - Aegis biometric identity suite —
AegisIdentityorchestrator withfull/face_only/edge_minimalconstructors; modality modelsMnemosyneIdentity(face),AriadneFingerprint,EchoSpeaker(voice),ArgusIris, plusThemisFusion(uncertainty-weighted fusion). Enrollment, verification, forensic verification, liveness, secure verification, identification. Companion losses:AngularMarginLoss,CenterLoss,ContrastiveLoss,CrystallizationLoss,DiversityRegularization,EchoLoss,ArgusLoss,LivenessLoss,ThemisLoss. - Model Hub —
download_weights,load_state_dict,list_models,model_info,is_cached,model_registry, with on-disk caching. - CUDA feature — optional
cudacargo feature propagates to core/tensor/autograd/nn.
Modules
| Module | Description |
|---|---|
transforms |
Image data-augmentation and preprocessing transforms |
datasets |
MNIST, Fashion-MNIST, CIFAR-10/100 loaders plus synthetic variants |
models |
All neural network architectures (see below) |
models::biometric |
Aegis biometric suite (Mnemosyne, Ariadne, Echo, Argus, Themis + identity orchestrator) |
models::helios |
YOLO-style object detector with 5 size variants |
models::nexus |
Predictive dual-pathway detector with object memory |
models::phantom |
Temporal event-driven face detection |
models::nightvision |
Multi-domain infrared detection |
models::aegis3d |
Octree-adaptive neural implicit surface reconstruction |
camera |
Camera I/O utilities |
edge |
Edge-deployment helpers |
hub |
Pretrained model weights management |
image_io |
Image load/save helpers |
losses |
Vision-specific loss functions |
ops |
Low-level vision ops |
training |
Training utilities |
Usage
Add to your Cargo.toml:
[dependencies]
axonml-vision = "0.6.1"
Loading Datasets
use axonml_vision::prelude::*;
// Synthetic MNIST for fast tests
let train_data = SyntheticMNIST::train();
let test_data = SyntheticMNIST::test();
// Synthetic CIFAR-10
let cifar = SyntheticCIFAR::small();
let (image, label) = train_data.get(0).unwrap();
assert_eq!(image.shape(), &[1, 28, 28]); // MNIST: 1 channel, 28x28
assert_eq!(label.shape(), &[10]); // One-hot encoded
Image Transforms
use axonml_vision::{Resize, CenterCrop, RandomHorizontalFlip, ImageNormalize};
use axonml_data::{Compose, Transform};
let transform = Compose::empty()
.add(Resize::new(256, 256))
.add(CenterCrop::new(224, 224))
.add(RandomHorizontalFlip::new())
.add(ImageNormalize::imagenet());
let output = transform.apply(&image);
assert_eq!(output.shape(), &[3, 224, 224]);
Normalization Presets
use axonml_vision::ImageNormalize;
let imagenet = ImageNormalize::imagenet(); // mean=[0.485,0.456,0.406] std=[0.229,0.224,0.225]
let mnist = ImageNormalize::mnist(); // mean=[0.1307] std=[0.3081]
let cifar10 = ImageNormalize::cifar10(); // mean=[0.4914,0.4822,0.4465] std=[0.2470,0.2435,0.2616]
Classification Models
use axonml_vision::{LeNet, MLP, SimpleCNN};
use axonml_vision::models::{resnet18, resnet34, vgg16, vit_base};
use axonml_nn::Module;
use axonml_autograd::Variable;
let lenet = LeNet::new(); // [N, 1, 28, 28] -> [N, 10]
let mlp = MLP::for_mnist(); // 784 -> 256 -> 128 -> 10
let rn18 = resnet18(1000); // ImageNet classes
let vgg = vgg16(1000, /*batch_norm=*/ true);
let vit = vit_base(1000);
Detection Models
use axonml_vision::models::{BlazeFace, RetinaFace, NanoDet, DETR};
use axonml_vision::models::helios::{Helios, HeliosSize};
let blaze = BlazeFace::new(); // dual-scale 128x128 face detector
let retina = RetinaFace::new(); // ResNet34 backbone
let nanodet = NanoDet::new(/*num_classes=*/ 80);
let detr = DETR::small(10);
let helios = Helios::small(80); // also: new(config), large(num_classes)
Novel Detection Architectures
use axonml_vision::models::{Nexus, Phantom, NightVision, NightVisionConfig};
let nexus = Nexus::default(); // predictive dual-pathway + object memory
let phantom = Phantom::default(); // event-driven temporal face detector
let night = NightVision::new(NightVisionConfig::default());
Dense Prediction & Anomaly / VQA
use axonml_vision::models::{DPT, FastDepth, PatchCore, StudentTeacher, VQAModel};
let dpt = DPT::small(); // transformer depth
let fast = FastDepth::new(); // mobile depth
let patch = PatchCore::default_rgb(); // anomaly detection, 256-d features
let st = StudentTeacher::default_rgb(); // student-teacher anomaly
let vqa = VQAModel::small(100, 50); // vocab=100, answers=50
Aegis3D — 3D Reconstruction
use axonml_vision::models::{Aegis3D, aegis3d::{Aegis3DConfig, AABB, SphereTracingConfig}};
let aegis3d = Aegis3D::new(); // Fourier-feature SDF + adaptive octree + sphere tracing + marching cubes
Full Training Pipeline
use axonml_vision::prelude::*;
use axonml_data::DataLoader;
use axonml_optim::{Adam, Optimizer};
use axonml_nn::{CrossEntropyLoss, Module};
let dataset = SyntheticMNIST::train();
let loader = DataLoader::new(dataset, 32).shuffle(true);
let model = LeNet::new();
let mut optim = Adam::new(model.parameters(), 0.001);
let loss_fn = CrossEntropyLoss::new();
for batch in loader.iter() {
let input = Variable::new(batch.data, true);
let target = batch.targets;
optim.zero_grad();
let output = model.forward(&input);
let loss = loss_fn.compute(&output, &target);
loss.backward();
optim.step();
}
Model Hub for Pretrained Weights
use axonml_vision::hub::{
download_weights, load_state_dict, list_models, model_info, is_cached, model_registry,
};
for model in list_models() {
println!("{}: {} classes, {:.1} MB", model.name, model.num_classes,
model.size_bytes as f64 / 1_000_000.0);
}
if let Some(info) = model_info("resnet18") {
println!("Top-1 accuracy: {:.2}%", info.accuracy);
}
if !is_cached("resnet18") {
let path = download_weights("resnet18", /*force=*/ false)?;
let state_dict = load_state_dict(&path)?;
// model.load_state_dict(state_dict);
}
Aegis Identity — Biometric Framework
Unified biometric identity system with 5 modality-specific architectures plus ThemisFusion for uncertainty-weighted evidence fusion. Designed for edge deployment (sub-2 MB total in edge_minimal configuration).
use axonml_vision::models::biometric::{
AegisIdentity, BiometricEvidence, BiometricModality,
};
use axonml_autograd::Variable;
use axonml_tensor::Tensor;
// Full multimodal system — face + fingerprint + voice + iris
let mut aegis = AegisIdentity::full();
// Or smaller deployments:
let face_only = AegisIdentity::face_only();
let edge = AegisIdentity::edge_minimal();
// Enroll
let face = Variable::new(Tensor::randn(&[1, 3, 64, 64]), false);
let evidence = BiometricEvidence::new().with_face(face);
let enrolled = aegis.enroll(1001, &evidence);
// Verify
let probe = BiometricEvidence::new()
.with_face(Variable::new(Tensor::randn(&[1, 3, 64, 64]), false));
let verification = aegis.verify(1001, &probe);
println!("match={}, score={:.3}, confidence={:.3}",
verification.is_match, verification.match_score, verification.confidence);
// Forensic verification with per-modality scores and cross-modal consistency
let (result, forensic) = aegis.verify_forensic(1001, &probe);
// Anti-spoofing liveness
let liveness = aegis.assess_liveness(&evidence);
// Quality -> liveness -> verification secure pipeline
let secure = aegis.secure_verify(1001, &evidence);
// 1:N identification
let ident = aegis.identify(&probe);
Modality architectures:
| Model | Modality | Novel idea |
|---|---|---|
MnemosyneIdentity |
Face | Identity crystallizes via GRU attractor convergence |
AriadneFingerprint |
Fingerprint | Ridge event fields with Gabor wavelets |
EchoSpeaker |
Voice | Identity = unpredictable speech residuals |
ArgusIris |
Iris | Polar-native radial / angular Conv1d encoding (backed by polar::polar_unwrap) |
ThemisFusion |
Fusion | Belief propagation with uncertainty gating |
Features flags
default = ["download"]— enablesreqwestfor hub downloads.cuda— propagates CUDA support toaxonml-tensor,axonml-nn,axonml-autograd,axonml-core.
Tests
cargo test -p axonml-vision
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT License (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Last updated: 2026-04-16 (v0.6.1)
Dependencies
~15–31MB
~458K SLoC