4 releases

0.0.6 Sep 19, 2023
0.0.5 Jan 22, 2022
0.0.4 Jul 18, 2021
0.0.3 May 30, 2021

#924 in Science

Download history 1/week @ 2023-12-11 5/week @ 2023-12-18 8/week @ 2023-12-25 4/week @ 2024-01-15 3/week @ 2024-02-05 10/week @ 2024-02-12 9/week @ 2024-02-19 54/week @ 2024-02-26 15/week @ 2024-03-04 18/week @ 2024-03-11 18/week @ 2024-03-18 35/week @ 2024-03-25

92 downloads per month
Used in 7 crates

MIT/Apache

68KB
1.5K SLoC

Core components for reinforcement learning.

Observation and action

[Obs] and [Act] traits are abstractions of observation and action in environments. These traits can handle two or more samples for implementing vectorized environments.

Environment

[Env] trait is an abstraction of environments. It has four associated types: Config, Obs, Act and Info. Obs and Act are concrete types of observation and action of the environment. These must implement [Obs] and [Act] traits, respectively. The environment that implements [Env] generates [Step<E: Env>] object at every environment interaction step with [Env::step()] method.

Info stores some information at every step of interactions of an agent and the environment. It could be empty (zero-sized struct). Config represents configurations of the environment and is used to build.

Policy

[Policy<E: Env>] represents a policy, from which actions are sampled for environment E. [Policy::sample()] takes E::Obs and emits E::Act. It could be probabilistic or deterministic.

Agent

In this crate, [Agent<E: Env, R: ReplayBufferBase>] is defined as trainable [Policy<E: Env>]. It is in either training or evaluation mode. In training mode, the agent's policy might be probabilistic for exploration, while in evaluation mode, the policy might be deterministic.

[Agent::opt()] method does a single optimization step. The definition of an optimization step depends on each agent. It might be multiple stochastic gradient steps in an optimization step. Samples for training are taken from R: ReplayBufferBase.

This trait also has methods for saving/loading the trained policy in the given directory.

Replay buffer

ReplayBufferBase trait is an abstraction of replay buffers. For handling samples, there are two associated types: PushedItem and Batch. PushedItem is a type representing samples pushed to the buffer. These samples might be generated from [Step<E: Env>]. [StepProcessorBase<E: Env>] trait provides the interface for converting [Step<E: Env>] into PushedItem.

Batch is a type of samples taken from the buffer for training Agents. The user implements [Agent::opt()] method such that it handles Batch objects for doing an optimization step.

A reference implementation

SimpleReplayBuffer<O, A> implementats ReplayBufferBase. This type has two parameters O and A, which are representation of observation and action in the replay buffer. O and A must implement SubBatch, which has the functionality of storing samples, like Vec<T>, for observation and action. The associated types PushedItem and Batch are the same type, StdBatch, representing sets of (o_t, r_t, a_t, o_t+1).

SimpleStepProcessor<E, O, A> might be used with SimpleReplayBuffer<O, A>. It converts E::Obs and E::Act into SubBatchs of respective types and generates StdBatch. The conversion process relies on trait bounds, O: From<E::Obs> and A: From<E::Act>.

Trainer

Trainer manages training loop and related objects. The Trainer object is built with configurations of [Env], ReplayBufferBase, StepProcessorBase and some training parameters. Then, Trainer::train method starts training loop with given Agent and Recorder.

Dependencies

~4.5–6.5MB
~137K SLoC