17 releases (5 breaking)
Uses new Rust 2024
| new 0.6.0 | Apr 9, 2026 |
|---|---|
| 0.5.0 | Mar 31, 2026 |
| 0.4.3 | Mar 25, 2026 |
| 0.3.2 | Feb 27, 2026 |
| 0.1.0 | Jan 19, 2026 |
#1317 in Machine learning
Used in 5 crates
1MB
22K
SLoC
axonml-data
Overview
axonml-data provides data loading infrastructure for training neural networks in the AxonML framework. It includes the Dataset trait, efficient DataLoader with batching and shuffling, various sampling strategies, and composable data transforms.
Features
- Dataset Trait - Core abstraction for indexed data access with
TensorDataset,MapDataset,ConcatDataset, andSubsetDatasetimplementations - DataLoader - Efficient batched iteration with configurable batch size, shuffling, and drop-last behavior
- Samplers - Flexible sampling strategies including
SequentialSampler,RandomSampler,SubsetRandomSampler,WeightedRandomSampler, andBatchSampler - Transforms - Composable data augmentation with
Normalize,RandomNoise,RandomCrop,RandomFlip,Scale,Clamp, and more - Collate Functions - Batch assembly with
DefaultCollateandStackCollatefor tensor stacking - Generic DataLoader - Flexible loader that works with any
DatasetandCollatecombination
Modules
| Module | Description |
|---|---|
dataset |
Core Dataset trait and implementations (TensorDataset, MapDataset, ConcatDataset, SubsetDataset, InMemoryDataset) |
dataloader |
DataLoader for batched iteration with shuffling support |
sampler |
Sampling strategies for controlling data access patterns |
transforms |
Composable data transformations for preprocessing and augmentation |
collate |
Batch assembly functions for combining samples into tensors |
Usage
Add to your Cargo.toml:
[dependencies]
axonml-data = "0.1.0"
Creating a Dataset
use axonml_data::prelude::*;
// From tensors
let x = Tensor::from_vec(vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0], &[3, 2]).unwrap();
let y = Tensor::from_vec(vec![0.0, 1.0, 0.0], &[3]).unwrap();
let dataset = TensorDataset::new(x, y);
assert_eq!(dataset.len(), 3);
let (input, target) = dataset.get(0).unwrap();
Using the DataLoader
use axonml_data::{DataLoader, TensorDataset};
let dataset = TensorDataset::new(x_data, y_data);
// Create loader with batch size 32
let loader = DataLoader::new(dataset, 32)
.shuffle(true)
.drop_last(false);
// Iterate over batches
for batch in loader.iter() {
let inputs = batch.data;
let targets = batch.targets;
// ... process batch ...
}
Implementing Custom Datasets
use axonml_data::Dataset;
use axonml_tensor::Tensor;
struct MyDataset {
data: Vec<(Tensor<f32>, Tensor<f32>)>,
}
impl Dataset for MyDataset {
type Item = (Tensor<f32>, Tensor<f32>);
fn len(&self) -> usize {
self.data.len()
}
fn get(&self, index: usize) -> Option<Self::Item> {
self.data.get(index).cloned()
}
}
Data Transforms
use axonml_data::{Compose, Normalize, RandomNoise, Scale, Transform};
// Compose multiple transforms
let transform = Compose::empty()
.add(Normalize::new(0.5, 0.5))
.add(RandomNoise::new(0.01))
.add(Scale::new(2.0));
let output = transform.apply(&input_tensor);
Using Samplers
use axonml_data::{RandomSampler, WeightedRandomSampler, BatchSampler, Sampler};
// Random sampling without replacement
let sampler = RandomSampler::new(1000);
for idx in sampler.iter() {
// Process sample at idx
}
// Weighted sampling for imbalanced datasets
let weights = vec![1.0, 2.0, 0.5, 3.0];
let sampler = WeightedRandomSampler::new(weights, 100, true);
// Batch sampling
let base_sampler = RandomSampler::new(1000);
let batch_sampler = BatchSampler::new(base_sampler, 32, false);
for batch_indices in batch_sampler.iter() {
// batch_indices is Vec<usize>
}
Dataset Splitting
use axonml_data::{TensorDataset, SubsetDataset};
let dataset = TensorDataset::new(x_data, y_data);
// Random split: 80% train, 20% validation
let splits = SubsetDataset::random_split(dataset, &[800, 200]);
let train_dataset = &splits[0];
let val_dataset = &splits[1];
Combining Datasets
use axonml_data::{TensorDataset, ConcatDataset, MapDataset};
// Concatenate datasets
let combined = ConcatDataset::new(vec![dataset1, dataset2, dataset3]);
// Apply transform to dataset
let mapped = MapDataset::new(dataset, |(x, y)| {
(x.mul_scalar(2.0), y)
});
Tests
Run the test suite:
cargo test -p axonml-data
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT License (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Dependencies
~6MB
~118K SLoC