#data-loader #machine-learning #tensorflow #ai #pytorch

ai-dataloader

Rust implementation to the PyTorch DataLoader

12 unstable releases (4 breaking)

0.6.1 Aug 13, 2023
0.6.0 May 22, 2023
0.5.4 Apr 29, 2023
0.4.0 Mar 16, 2023
0.2.1 Sep 27, 2022

#244 in Machine learning

MIT/Apache

175KB
2.5K SLoC

CI Crates.io Documentation

ai-dataloader

A rust port of pytorch dataloader library.

Note: This project is still heavily in development and is at an early stage.

Highlights

  • Iterable or indexable (Map style) DataLoader.
  • Customizable Sampler, BatchSampler and collate_fn.
  • Parallel dataloader using rayon for indexable dataloader (experimental).
  • Integration with ndarray and tch-rs, CPU and GPU support.
  • Default collate function that will automatically collate most of your type (supporting nesting).
  • Shuffling for iterable and indexable DataLoader.

More info in the documentation.

Examples

Examples can be found in the examples folder but here there is a simple one

use ai_dataloader::DataLoader;
let loader = DataLoader::builder(vec![(0, "hola"), (1, "hello"), (2, "hallo"), (3, "bonjour")]).batch_size(2).shuffle().build();

for (label, text) in &loader {     
    println!("Label {label:?}");
    println!("Text {text:?}");
}

tch-rs integration

In order to collate your data into torch tensor that can run on the GPU, you must activate the tch feature.

This feature relies on the tch crate for bindings to the C++ libTorch API. The libtorch library is required can be downloaded either automatically or manually. The following provides a reference on how to set up your environment to use these bindings, please refer to the tch for detailed information or support.

Next Features

This features could be added in the future:

  • RandomSampler with replacement
  • parallel dataloader for iterable dataset
  • distributed dataloader

MSRV

The current MSRV is 1.60.

Dependencies

~3.5–6MB
~119K SLoC