Border is a reinforcement learning library in Rust.


Border is currently under development.


In order to run examples, install python>=3.7 and gym, for which the library provides a wrapper using PyO3.


Random policy on cartople environment

The following command runs a random controller (policy) for 5 episodes in CartPole-v0:

$ cargo run --example random_cartpole

It renders during the episodes and generates a csv file in examples/model, including the sequences of observation and reward values in the episodes.

$ head -n3 examples/model/random_cartpole_eval.csv

Deep Q-network (DQN) on cartpole environment

The following command trains a DQN agent:

$ cargo run --example dqn_cartpole

After training, the trained agent runs for 5 episodes. The parameters of the trained Q-network (and the target network) are saved in examples/model/dqn_cartpole.

Soft actor-critic (SAC) on pendulum environment

The following command trains a SAC agent on Pendulum-v0, which takes continuous action:

$ cargo run --example sac_pendulum

The code defines an action filter that doubles the torque in the environment.

Atari games

The following command trains a DQN agent on PongNoFrameskip-v4:

$ PYTHONPATH=$REPO/examples cargo run --release --example dqn_atari -- PongNoFrameskip-v4

During training, the program will save the model parameters when the evaluation reward achieves its maximum value. The agent can be trained for other atari games (e.g., SeaquestNoFrameskip-v4) by replacing the name of the environment in the above command.

For Pong, you can download a pretrained agent from my google drive and see how it plays with the following command:

$ PYTHONPATH=$REPO/examples cargo run --release --example dqn_atari -- PongNoFrameskip-v4 --play-gdrive

The pretrained agent will be saved locally in $HOME/.border/model.

Vectorized environment for atari games

The following command trains a DQN agent in an vectorized environment of Pong:

$ PYTHONPATH=$REPO/examples cargo run --release --example dqn_pong_vecenv

The code demonstrates how to use vectorized environments, in which 4 environments are running synchronously. It took about 11 hours for 2M steps (8M transition samples) on a g3s.xlarge instance of EC2. Hyperparameter values, tuned specific to Pong instead of all Atari games, are adapted from the book Deep Reinforcement Learning Hands-On. The learning curve is as shown below.

After the training, you can see how the agent plays:

$ PYTHONPATH=$REPO/examples cargo run --example dqn_pong_eval




Border is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0).


