3 releases
0.0.7 | Sep 1, 2024 |
---|---|
0.0.6 | Sep 19, 2023 |
0.0.5 | Jan 29, 2022 |
#458 in Science
Used in 3 crates
145KB
3K
SLoC
Asynchronous trainer with parallel sampling processes.
The code might look like below.
#
#
type Env = TestEnv;
type ObsBatch = TestObsBatch;
type ActBatch = TestActBatch;
type ReplayBuffer = SimpleReplayBuffer<ObsBatch, ActBatch>;
type StepProcessor = SimpleStepProcessor<Env, ObsBatch, ActBatch>;
// Create a new agent by wrapping the existing agent in order to implement SyncModel.
struct TestAgent2(TestAgent);
impl border_core::Configurable<Env> for TestAgent2 {
type Config = TestAgentConfig;
fn build(config: Self::Config) -> Self {
Self(TestAgent::build(config))
}
}
impl border_core::Agent<Env, ReplayBuffer> for TestAgent2 {
// Boilerplate code to delegate the method calls to the inner agent.
fn train(&mut self) {
self.0.train();
}
// For other methods ...
}
impl border_core::Policy<Env> for TestAgent2 {
// Boilerplate code to delegate the method calls to the inner agent.
// ...
}
impl border_async_trainer::SyncModel for TestAgent2{
// Self::ModelInfo shold include the model parameters.
type ModelInfo = usize;
fn model_info(&self) -> (usize, Self::ModelInfo) {
// Extracts the model parameters and returns them as Self::ModelInfo.
// The first element of the tuple is the number of optimization steps.
(0, 0)
}
fn sync_model(&mut self, _model_info: &Self::ModelInfo) {
// implements synchronization of the model based on the _model_info
}
}
let agent_configs: Vec<_> = vec![agent_config()];
let env_config_train = env_config();
let env_config_eval = env_config();
let replay_buffer_config = SimpleReplayBufferConfig::default();
let step_proc_config = SimpleStepProcessorConfig::default();
let actor_man_config = ActorManagerConfig::default();
let async_trainer_config = AsyncTrainerConfig::default();
let mut recorder: Box<dyn AggregateRecorder> = Box::new(NullRecorder {});
let mut evaluator = DefaultEvaluator::<TestEnv, TestAgent2>::new(&env_config_eval, 0, 1).unwrap();
border_async_trainer::util::train_async::<_, _, _, StepProcessor>(
&agent_config(),
&agent_configs,
&env_config_train,
&env_config_eval,
&step_proc_config,
&replay_buffer_config,
&actor_man_config,
&async_trainer_config,
&mut recorder,
&mut evaluator,
);
Training process consists of the following two components:
ActorManager
managesActor
s, each of which runs a thread for interactingAgent
andEnv
and taking samples. Those samples will be sent to the replay buffer inAsyncTrainer
.AsyncTrainer
is responsible for training of an agent. It also runs a thread for pushing samples fromActorManager
into a replay buffer.
The Agent
must implement SyncModel
trait in order to synchronize the model of
the agent in Actor
with the trained agent in AsyncTrainer
. The trait has
the ability to import and export the information of the model as
SyncModel
::ModelInfo
.
The Agent
in AsyncTrainer
is responsible for training, typically with a GPU,
while the Agent
s in Actor
s in ActorManager
is responsible for sampling
using CPU.
Both AsyncTrainer
and ActorManager
are running in the same machine and
communicate by channels.
Dependencies
~13–21MB
~276K SLoC