|new 0.0.2||May 1, 2021|
|0.0.1||Apr 8, 2021|
|0.0.0||Mar 13, 2021|
#64 in Science
Border is a reinforcement learning library in Rust.
Border is currently under development.
The following command runs a random controller (policy) for 5 episodes in CartPole-v0:
$ cargo run --example random_cartpole
It renders during the episodes and generates a csv file in
examples/model, including the sequences of observation and reward values in the episodes.
$ head -n3 examples/model/random_cartpole_eval.csv 0,0,1.0,-0.012616985477507114,0.19292789697647095,0.04204097390174866,-0.2809212803840637 0,1,1.0,-0.008758427575230598,-0.0027677505277097225,0.036422546952962875,0.024719225242733955 0,2,1.0,-0.008813782595098019,-0.1983925849199295,0.036916933953762054,0.3286677300930023
The following command trains a DQN agent:
$ cargo run --example dqn_cartpole
After training, the trained agent runs for 5 episodes. The parameters of the trained Q-network (and the target network) are saved in
The following command trains a SAC agent on Pendulum-v0, which takes continuous action:
$ cargo run --example sac_pendulum
The code defines an action filter that doubles the torque in the environment.
The following command trains a DQN agent on PongNoFrameskip-v4:
$ PYTHONPATH=$REPO/examples cargo run --release --example dqn_atari -- PongNoFrameskip-v4
During training, the program will save the model parameters when the evaluation reward achieves its maximum value. The agent can be trained for other atari games (e.g.,
SeaquestNoFrameskip-v4) by replacing the name of the environment in the above command.
For Pong, you can download a pretrained agent from my google drive and see how it plays with the following command:
$ PYTHONPATH=$REPO/examples cargo run --release --example dqn_atari -- PongNoFrameskip-v4 --play-gdrive
The pretrained agent will be saved locally in
(The code might be broken due to recent changes. It will be fixed in future. The below description is for an older version)
The following command trains a DQN agent in an vectorized environment of Pong:
$ PYTHONPATH=$REPO/examples cargo run --release --example dqn_pong_vecenv
The code demonstrates how to use vectorized environments, in which 4 environments are running synchronously. It took about 11 hours for 2M steps (8M transition samples) on a
g3s.xlarge instance of EC2. Hyperparameter values, tuned specific to Pong instead of all Atari games, are adapted from the book Deep Reinforcement Learning Hands-On. The learning curve is as shown below.
After the training, you can see how the agent plays:
$ PYTHONPATH=$REPO/examples cargo run --example dqn_pong_eval
- Environments which wrap gym using PyO3 and ndarray
- Interfaces to record quantities in training process or in evaluation path
- Support tensorboard using tensorboard-rs
- Vectorized environment using a tweaked atari_wrapper.py, adapted from the RL example in tch
- Agents based on tch
- More tests and documentations
- More environments
- More RL algorithms
Border is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0).