3 releases

0.0.7 Sep 1, 2024
0.0.6 Sep 19, 2023
0.0.5 Jan 29, 2022

#458 in Science


Used in 3 crates

MIT/Apache

145KB
3K SLoC

Asynchronous trainer with parallel sampling processes.

The code might look like below.

#
#

type Env = TestEnv;
type ObsBatch = TestObsBatch;
type ActBatch = TestActBatch;
type ReplayBuffer = SimpleReplayBuffer<ObsBatch, ActBatch>;
type StepProcessor = SimpleStepProcessor<Env, ObsBatch, ActBatch>;

// Create a new agent by wrapping the existing agent in order to implement SyncModel.
struct TestAgent2(TestAgent);

impl border_core::Configurable<Env> for TestAgent2 {
    type Config = TestAgentConfig;

    fn build(config: Self::Config) -> Self {
        Self(TestAgent::build(config))
    }
}

impl border_core::Agent<Env, ReplayBuffer> for TestAgent2 {
    // Boilerplate code to delegate the method calls to the inner agent.
    fn train(&mut self) {
        self.0.train();
     }

     // For other methods ...
}

impl border_core::Policy<Env> for TestAgent2 {
      // Boilerplate code to delegate the method calls to the inner agent.
      // ...
}

impl border_async_trainer::SyncModel for TestAgent2{
    // Self::ModelInfo shold include the model parameters.
    type ModelInfo = usize;


    fn model_info(&self) -> (usize, Self::ModelInfo) {
        // Extracts the model parameters and returns them as Self::ModelInfo.
        // The first element of the tuple is the number of optimization steps.
        (0, 0)
    }

    fn sync_model(&mut self, _model_info: &Self::ModelInfo) {
        // implements synchronization of the model based on the _model_info
    }
}

let agent_configs: Vec<_> = vec![agent_config()];
let env_config_train = env_config();
let env_config_eval = env_config();
let replay_buffer_config = SimpleReplayBufferConfig::default();
let step_proc_config = SimpleStepProcessorConfig::default();
let actor_man_config = ActorManagerConfig::default();
let async_trainer_config = AsyncTrainerConfig::default();
let mut recorder: Box<dyn AggregateRecorder> = Box::new(NullRecorder {});
let mut evaluator = DefaultEvaluator::<TestEnv, TestAgent2>::new(&env_config_eval, 0, 1).unwrap();

border_async_trainer::util::train_async::<_, _, _, StepProcessor>(
    &agent_config(),
    &agent_configs,
    &env_config_train,
    &env_config_eval,
    &step_proc_config,
    &replay_buffer_config,
    &actor_man_config,
    &async_trainer_config,
    &mut recorder,
    &mut evaluator,
);

Training process consists of the following two components:

  • ActorManager manages Actors, each of which runs a thread for interacting Agent and Env and taking samples. Those samples will be sent to the replay buffer in AsyncTrainer.
  • AsyncTrainer is responsible for training of an agent. It also runs a thread for pushing samples from ActorManager into a replay buffer.

The Agent must implement SyncModel trait in order to synchronize the model of the agent in Actor with the trained agent in AsyncTrainer. The trait has the ability to import and export the information of the model as SyncModel::ModelInfo.

The Agent in AsyncTrainer is responsible for training, typically with a GPU, while the Agents in Actors in ActorManager is responsible for sampling using CPU.

Both AsyncTrainer and ActorManager are running in the same machine and communicate by channels.

Dependencies

~13–21MB
~276K SLoC