#rl #obs #border


Reinforcement learning library

4 releases

new 0.0.6 Sep 19, 2023
0.0.5 Jan 22, 2022
0.0.4 Jul 18, 2021
0.0.3 May 30, 2021

#486 in Science

Download history 37/week @ 2023-05-31 1/week @ 2023-06-07 32/week @ 2023-06-14 47/week @ 2023-06-21 19/week @ 2023-06-28 30/week @ 2023-07-05 17/week @ 2023-07-12 41/week @ 2023-07-19 28/week @ 2023-07-26 14/week @ 2023-08-02 22/week @ 2023-08-09 32/week @ 2023-08-16 15/week @ 2023-08-23 17/week @ 2023-08-30 33/week @ 2023-09-06 37/week @ 2023-09-13

102 downloads per month
Used in 7 crates


1.5K SLoC

Core components for reinforcement learning.

Observation and action

[Obs] and [Act] traits are abstractions of observation and action in environments. These traits can handle two or more samples for implementing vectorized environments.


[Env] trait is an abstraction of environments. It has four associated types: Config, Obs, Act and Info. Obs and Act are concrete types of observation and action of the environment. These must implement [Obs] and [Act] traits, respectively. The environment that implements [Env] generates [Step<E: Env>] object at every environment interaction step with [Env::step()] method.

Info stores some information at every step of interactions of an agent and the environment. It could be empty (zero-sized struct). Config represents configurations of the environment and is used to build.


[Policy<E: Env>] represents a policy, from which actions are sampled for environment E. [Policy::sample()] takes E::Obs and emits E::Act. It could be probabilistic or deterministic.


In this crate, [Agent<E: Env, R: ReplayBufferBase>] is defined as trainable [Policy<E: Env>]. It is in either training or evaluation mode. In training mode, the agent's policy might be probabilistic for exploration, while in evaluation mode, the policy might be deterministic.

[Agent::opt()] method does a single optimization step. The definition of an optimization step depends on each agent. It might be multiple stochastic gradient steps in an optimization step. Samples for training are taken from R: ReplayBufferBase.

This trait also has methods for saving/loading the trained policy in the given directory.

Replay buffer

ReplayBufferBase trait is an abstraction of replay buffers. For handling samples, there are two associated types: PushedItem and Batch. PushedItem is a type representing samples pushed to the buffer. These samples might be generated from [Step<E: Env>]. [StepProcessorBase<E: Env>] trait provides the interface for converting [Step<E: Env>] into PushedItem.

Batch is a type of samples taken from the buffer for training Agents. The user implements [Agent::opt()] method such that it handles Batch objects for doing an optimization step.

A reference implementation

SimpleReplayBuffer<O, A> implementats ReplayBufferBase. This type has two parameters O and A, which are representation of observation and action in the replay buffer. O and A must implement SubBatch, which has the functionality of storing samples, like Vec<T>, for observation and action. The associated types PushedItem and Batch are the same type, StdBatch, representing sets of (o_t, r_t, a_t, o_t+1).

SimpleStepProcessor<E, O, A> might be used with SimpleReplayBuffer<O, A>. It converts E::Obs and E::Act into SubBatchs of respective types and generates StdBatch. The conversion process relies on trait bounds, O: From<E::Obs> and A: From<E::Act>.


Trainer manages training loop and related objects. The Trainer object is built with configurations of [Env], ReplayBufferBase, StepProcessorBase and some training parameters. Then, Trainer::train method starts training loop with given Agent and Recorder.


~134K SLoC