### 11 releases (breaking)

Uses new Rust 2021

0.10.0 | Oct 30, 2022 |
---|---|

0.8.0 | Jul 13, 2022 |

#**6** in Machine learning

**3,171** downloads per month

**MIT/Apache**

565KB

11K
SLoC

# dfdx: shape checked deep learning in rust

Ergonomics & safety focused deep learning in Rust.

**Still in pre-alpha state. The next few releases are planned to be breaking releases.**

Features at a glance:

- Const generic tensor library with tensors up to 4d!
- Shape and type checked at compile time.
- A large library of tensor operations (including matmuls, convolutions, and shape transformations)
- Safe & easy to use neural network building blocks (including

,`Linear`

, and`Conv2D`

).`Transformer` - Standard deep learning optimizers such as Sgd and Adam.
- Reverse mode auto differentiation[1] implementation.
- Serialization to/from

and`.`npy

for transferring models to/from python.`.`npz

is on crates.io! Use by adding this to your `dfdx`

:`Cargo .toml`

`dfdx = "0.10.0"
`

See the documentation at docs.rs/dfdx.

[1] https://en.wikipedia.org/wiki/Automatic_differentiation#Reverse_accumulation

## Design Goals

- Ergonomics the whole way down (both frontend interface & internals).
- Check as much at compile time as possible (i.e. don't compile if something is not correct).
- Maximize performance.
- Minimize unsafe code[1]
- Minimize Arc/Rc and RefCells used in internal code[2]

[1] Currently the only unsafe calls are for matrix multiplication, and instantiating large arrays directly on the heap.

[2] The only things that use

are tensors to store their data. `Arc`

is used instead of `Arc`

to reduce
allocations when tensors are cloned.`Box`

## BLAS libraries

The matrixmultiply crate is the default BLAS library. **You don't need
to do download/install anything for this to work!**

To link to the

libraries (assuming you installed it already) use the `Intel MKL`

`intel-mkl`

feature.## API Preview

Check examples/ for more details.

- 👌 Simple Neural Networks API, completely shape checked at compile time.

`type` `MLP` `=` `(`
`(``Linear``<`10, 32`>``,` ReLU`)``,`
`(``Linear``<`32, 32`>``,` ReLU`)``,`
`(``Linear``<`32, 2`>``,` Tanh`)``,`
`)``;`
`fn` `main``(``)`` ``{`
`let` mlp`:` `MLP` `=` `Default``::`default`(``)``;`
`let` x`:` `Tensor1D``<`10`>` `=` `Tensor1D``::`zeros`(``)``;`
`let` y `/*`: Tensor1D<2>`*/` `=` mlp`.``forward``(`x`)``;`
`println!``(``"``{:?}``"``,` y`)``;`
mlp`.``save``(``"`checkpoint.npz`"``)``?``;`
`}`

- 📈 Ergonomic Optimizer API

`let` `mut` model`:` Model `=` `...`
`let` `mut` sgd `=` `Sgd``::`new`(`SgdConfig `{`
lr`:` 1e`-``2``,`
momentum`:` `Some``(``Momentum``::`Nesterov`(``0.``9``)``)`
`}``)``;`
`let` loss`:` `Tensor0D``<`OwnedTape`>` `=` `...`
`//` run backprop to get the gradients
`let` gradients `=` loss`.``backward``(``)``;`
sgd`.``update``(``&``mut` model`,` gradients`)``;`

- 💡 Tensors are backed by normal rust arrays, making it easy to access the underlying data!

`let` t0`:` Tensor0D `=` `tensor``(``0.``0``)``;`
`assert_eq!``(`t0`.``data``(``)``,` `&``0.``0``)``;`
`let` t1 `/*`: Tensor1D<3>`*/` `=` `tensor``(``[``1.``0``,` `2.``0``,` `3.``0``]``)``;`
`assert_eq!``(`t1`.``data``(``)``,` `&``[``1.``0``,` `2.``0``,` `3.``0``]``)``;`
`let` t2`:` `Tensor2D``<`2, 3`>` `=` `TensorCreator``::`ones`(``)``;`
`assert_eq!``(`t2`.``data``(``)``,` `&``[``[``1.``0``;` `3``]``;` `2``]``)``;`

## Fun/notable implementation details

### Module

`pub` `trait` `Module`<Input> `{`
`type` `Output``;`
`fn` `forward``(``&``self`, `input``:` Input`)`` ``->` `Self``::`Output`;`
`}`

From this flexible trait we get:

- Single & batched inputs (just have multiple impls!)
- Multiple inputs/outputs (multi-headed modules, or rnns)
- Behavior different when tape is present or not (
**not**the .train()/.eval() behavior present in other libraries!).

### Tuples represent feedforward (a.k.a sequential) modules

Since we can implement traits for tuples, which is *not possible in other languages* AFAIK, they provide a very nice frontend
for sequentially executing modules.

`//` no idea why you would do this, but you could!
`let` model`:` `(`ReLU`,` Sigmoid`,` Tanh`)` `=` `Default``::`default`(``)``;`

`let` model`:` `(``Linear``<`10, 5`>``,` Tanh`)` `=` `Default``::`default`(``)``;`

How implementing Module for a 2-tuple looks:

`impl``<`Input, A, B`>`` ``Module``<`Input`>` `for`` (``A`, `B`)
`where`
Input`:` Tensor,
A`:` `Module``<`Input`>`, `//` A is a module that takes Input
B`:` `Module``<``A``::`Output`>`, `//` B is a module that takes A's Output
`{`
`type` `Output` `=` `B``::`Output`;` `//` the output of this is B's Output
`fn` `forward``(``&``self`, `x``:` Input`)`` ``->` `Self``::`Output `{`
`let` x `=` `self``.``0.``forward``(`x`)``;`
`let` x `=` `self``.``1.``forward``(`x`)``;`
x
`}`
`}`

Modules implemented for Tuples up to 6 elements, but *you can arbitrarily nest them*!

### No `Rc``<`RefCells`<`T`>``>`

used - Gradient tape is not kept behind a cell!

`Rc``<`RefCells`<`T`>``>`Other implementations may store a reference to the gradient tape directly on tensors, which requires mutating tensors or using Rc/Refcells all over the place.

We've figured out an elegant way to avoid this, reducing references and dynamic borrow checks to 0!

Since all operations result in exactly 1 child, we can always move the gradient tape to the child of the last operation. Additionally, no model parameters (all tensors) will ever own the gradient tape because they will never be the result of any operation. This means we know exactly which tensor owns the gradient tape, and the tensors that have it will always be intermediate results that don't need to be maintained across gradient computation.

*All of this together gives users unprecedented control/precision over what tensors are recorded on the gradient tape!*

One advanced use case requires that tensors be re-used multiple times in a computation graph. This can be handled by cloning the tensor, and manually moving the gradient tape around.

### Type checked backward

tl;dr: If you forget to include a call to

or `trace``(``)`

, the program won't compile!`traced``(``)`

`-`let pred = module.forward(x);
`+`let pred = module.forward(x.traced());
let loss = (y - pred).square().mean();
let gradients = loss.backward();

Since we know exactly what tensors own the gradient tape, we can require the tensor passed into

to own the gradient tape!
And further, we can require it be moved into `.``backward``(``)`

, so it can destruct the tape and construct the gradients!`.``backward``(``)`

**All of this can be checked at compile time 🎉**

`pub` `fn` `backward``(``t``:` `Tensor0D``<`OwnedTape`>``)`` ``->` Gradients `{`
`...`
`}`

### Recursive trait definitions for CPU Device

Our src/devices backend for computing operations on the CPU
is built using **recursive trait definitions**.

The main idea behind this is similar to recursion or induction proofs. First we specify the base trait, and then we specify the recursive trait.

A simple example is counting the number of elements in an arbitrarily nested array at compile time.

First we specify the trait we want to do this:

`pub` `trait` `CountElements` `{`
`const` `NUM_ELEMENTS``:` `usize``;`
`}`

Now for the base case (assuming these will be arrays of floats), is just a single floating point number:

`impl` `CountElements ``for`` ``f32` `{`
`const` `NUM_ELEMENTS``:` `usize` `=` `1``;`
`}`

And finally the recursive trait:

`impl``<`T`:` CountElements, `const` M`:` `usize``>`` CountElements ``for`` [``T`; `M`] `{`
`const` `NUM_ELEMENTS``:` `usize` `=` M `*` `T``::``NUM_ELEMENTS``;`
`}`

Notice the restriction on T also implementing

. This allows us to use `CountElements`

in the trait body.`T ::`

`NUM_ELEMENTS`

Another few powerful things recursive traits can do:

- Map all elements of arbitarily nested arrays using a function
- Add two arrays together
- Reduce an array to one number
- Even more!

Encourage you to check out all the code in src/devices!

### 📄 Validated against pytorch

All functions & operations are tested against behavior shown by similar code in pytorch.

# License

Dual-licensed to be compatible with the Rust project.

Licensed under the Apache License, Version 2.0 http://www.apache.org/licenses/LICENSE-2.0 or the MIT license http://opensource.org/licenses/MIT, at your option. This file may not be copied, modified, or distributed except according to those terms.

#### Dependencies

~2MB

~42K SLoC