#machine-learning #candle #tensor #optimisers

candle-optimisers

Optimisers for use with candle, the minimalist ML framework

8 releases (5 breaking)

0.8.0 Nov 18, 2024
0.7.2 Oct 5, 2024
0.6.0 Aug 4, 2024
0.5.0 May 4, 2024
0.3.1 Dec 20, 2023

#98 in Machine learning


Used in 3 crates (via border-candle-agent)

MIT license

165KB
3K SLoC

Candle Optimisers

License: MIT codecov Tests Tests Latest version Documentation

A crate for optimisers for use with candle, the minimalist ML framework

Optimisers implemented are:

  • SGD (including momentum and weight decay)

  • RMSprop

Adaptive methods:

  • AdaDelta

  • AdaGrad

  • AdaMax

  • Adam

  • AdamW (included with Adam as decoupled_weight_decay)

  • NAdam

  • RAdam

These are all checked against their pytorch implementation (see pytorch_test.ipynb) and should implement the same functionality (though without some input checking).

Additionally all of the adaptive mehods listed and SGD implement decoupled weight decay as described in Decoupled Weight Decay Regularization, in addition to the standard weight decay as implemented in pytorch.

Pseudosecond order methods:

  • LBFGS

This is not implemented equivalent to pytorch, but is checked on the 2D rosenbrock function

Examples

There is an mnist toy program along with a simple example of adagrad. Whilst the parameters of each method aren't tuned (all default with user input learning rate), the following converges quite nicely:

cargo r -r --example mnist mlp --optim r-adam --epochs 2000 --learning-rate 0.025

For even faster training try:

cargo r -r --features cuda --example mnist mlp --optim r-adam --epochs 2000 --learning-rate 0.025

to use the cuda backend.

Usage

cargo add --git https://github.com/KGrewal1/optimisers.git candle-optimisers

Documentation

Documentation is available on the rust docs site https://docs.rs/candle-optimisers

Dependencies

~9–19MB
~325K SLoC