tensorforce.agents package¶
Submodules¶
tensorforce.agents.agent module¶
-
class
tensorforce.agents.agent.
Agent
(states_spec, actions_spec, batched_observe=1000, scope='base_agent')¶ Bases:
object
Basic Reinforcement learning agent. An agent encapsulates execution logic of a particular reinforcement learning algorithm and defines the external interface to the environment.
The agent hence acts as an intermediate layer between environment and backend execution (value function or policy updates).
-
act
(states, deterministic=False)¶ Return action(s) for given state(s). States preprocessing and exploration are applied if configured accordingly.
Parameters: - states (any) – One state (usually a value tuple) or dict of states if multiple states are expected.
- deterministic (bool) – If true, no exploration and sampling is applied.
Returns: Scalar value of the action or dict of multiple actions the agent wants to execute.
-
close
()¶
-
static
from_spec
(spec, kwargs)¶ Creates an agent from a specification dict.
-
initialize_model
()¶ Creates the model for the respective agent based on specifications given by user. This is a separate call after constructing the agent because the agent constructor has to perform a number of checks on the specs first, sometimes adjusting them e.g. by converting to a dict.
-
last_observation
()¶
-
observe
(terminal, reward)¶ Observe experience from the environment to learn from. Optionally pre-processes rewards Child classes should call super to get the processed reward EX: terminal, reward = super()…
Parameters: - terminal (bool) – boolean indicating if the episode terminated after the observation.
- reward (float) – scalar reward that resulted from executing the action.
-
static
process_action_spec
(actions_spec)¶
-
static
process_state_spec
(states_spec)¶
-
reset
()¶ Reset the agent to its initial state on episode start. Updates internal episode and timestep counter, internal states, and resets preprocessors.
-
restore_model
(directory=None, file=None)¶ Restore TensorFlow model. If no checkpoint file is given, the latest checkpoint is restored. If no checkpoint directory is given, the model’s default saver directory is used (unless file specifies the entire path).
Parameters: - directory – Optional checkpoint directory.
- file – Optional checkpoint file, or path if directory not given.
-
save_model
(directory=None, append_timestep=True)¶ Save TensorFlow model. If no checkpoint directory is given, the model’s default saver directory is used. Optionally appends current timestep to prevent overwriting previous checkpoint files. Turn off to be able to load model from the same given path argument as given here.
Parameters: - directory (str) – Optional checkpoint directory.
- append_timestep (bool) – Appends the current timestep to the checkpoint file if true. If this is set to True, the load path must include the checkpoint timestep suffix. For example, if stored to models/ and set to true, the exported file will be of the form models/model.ckpt-X where X is the last timestep saved. The load path must precisely match this file name. If this option is turned off, the checkpoint will always overwrite the file specified in path and the model can always be loaded under this path.
Returns: Checkpoint path were the model was saved.
-
should_stop
()¶
-
tensorforce.agents.batch_agent module¶
-
class
tensorforce.agents.batch_agent.
BatchAgent
(states_spec, actions_spec, batched_observe=1000, summary_spec=None, network_spec=None, discount=0.99, device=None, session_config=None, scope='batch_agent', saver_spec=None, distributed_spec=None, optimizer=None, variable_noise=None, states_preprocessing_spec=None, explorations_spec=None, reward_preprocessing_spec=None, distributions_spec=None, entropy_regularization=None, batch_size=1000, keep_last_timestep=True)¶ Bases:
tensorforce.agents.learning_agent.LearningAgent
The
BatchAgent
class implements a batch memory which generally implies on-policy experience collection and updates.-
observe
(terminal, reward)¶ Adds an observation and performs an update if the necessary conditions are satisfied, i.e. if one batch of experience has been collected as defined by the batch size.
In particular, note that episode control happens outside of the agent since the agent should be agnostic to how the training data is created.
Parameters: - terminal (bool) – Whether episode is terminated or not.
- reward (float) – The scalar reward value.
-
reset_batch
()¶ Cleans up after a batch has been processed (observed). Resets all batch information to be ready for new observation data. Batch information contains:
- observed states
- internal-variables
- taken actions
- observed is-terminal signals/rewards
- total batch size
-
tensorforce.agents.constant_agent module¶
Random agent that always returns a random action. Useful to be able to get random agents with specific shapes.
-
class
tensorforce.agents.constant_agent.
ConstantAgent
(states_spec, actions_spec, batched_observe=1000, scope='constant', action_values=None)¶ Bases:
tensorforce.agents.agent.Agent
Constant action agent for sanity checks. Returns a constant value at every step, useful to debug continuous problems.
-
initialize_model
()¶
-
tensorforce.agents.ddqn_agent module¶
-
class
tensorforce.agents.ddqn_agent.
DDQNAgent
(states_spec, actions_spec, batched_observe=1000, scope='ddqn', summary_spec=None, network_spec=None, device=None, session_config=None, saver_spec=None, distributed_spec=None, optimizer=None, discount=0.99, variable_noise=None, states_preprocessing_spec=None, explorations_spec=None, reward_preprocessing_spec=None, distributions_spec=None, entropy_regularization=None, batch_size=32, memory=None, first_update=10000, update_frequency=4, repeat_update=1, target_sync_frequency=10000, target_update_weight=1.0, huber_loss=None)¶ Bases:
tensorforce.agents.memory_agent.MemoryAgent
Double DQN Agent based on Van Hasselt et al.. Simple extension to DQN which improves stability.
-
initialize_model
()¶
-
tensorforce.agents.dqfd_agent module¶
-
class
tensorforce.agents.dqfd_agent.
DQFDAgent
(states_spec, actions_spec, batched_observe=1000, scope='dqfd', summary_spec=None, network_spec=None, device=None, session_config=None, saver_spec=None, distributed_spec=None, optimizer=None, discount=0.99, variable_noise=None, states_preprocessing_spec=None, explorations_spec=None, reward_preprocessing_spec=None, distributions_spec=None, entropy_regularization=None, batch_size=32, memory=None, first_update=10000, update_frequency=4, repeat_update=1, target_sync_frequency=10000, target_update_weight=1.0, huber_loss=None, expert_margin=0.5, supervised_weight=0.1, demo_memory_capacity=10000, demo_sampling_ratio=0.2)¶ Bases:
tensorforce.agents.memory_agent.MemoryAgent
Deep Q-learning from demonstration (DQFD) agent (Hester et al., 2017). This agent uses DQN to pre-train from demonstration data via an additional supervised loss term.
-
import_demonstrations
(demonstrations)¶ Imports demonstrations, i.e. expert observations. Note that for large numbers of observations, set_demonstrations is more appropriate, which directly sets memory contents to an array an expects a different layout.
Parameters: demonstrations – List of observation dicts
-
initialize_model
()¶
-
observe
(reward, terminal)¶ Adds observations, updates via sampling from memories according to update rate. DQFD samples from the online replay memory and the demo memory with the fractions controlled by a hyper parameter p called ‘expert sampling ratio.
-
pretrain
(steps)¶ Computes pre-train updates.
Parameters: steps – Number of updates to execute.
-
set_demonstrations
(batch)¶ Set all demonstrations from batch data. Expects a dict wherein each value contains an array containing all states, actions, rewards, terminals and internals respectively.
Parameters: batch –
-
tensorforce.agents.dqn_agent module¶
-
class
tensorforce.agents.dqn_agent.
DQNAgent
(states_spec, actions_spec, batched_observe=None, scope='dqn', summary_spec=None, network_spec=None, device=None, session_config=None, saver_spec=None, distributed_spec=None, optimizer=None, discount=0.99, variable_noise=None, states_preprocessing_spec=None, explorations_spec=None, reward_preprocessing_spec=None, distributions_spec=None, entropy_regularization=None, batch_size=32, memory=None, first_update=10000, update_frequency=4, repeat_update=1, target_sync_frequency=10000, target_update_weight=1.0, double_q_model=False, huber_loss=None)¶ Bases:
tensorforce.agents.memory_agent.MemoryAgent
Deep-Q-Network agent (DQN). The piece de resistance of deep reinforcement learning as described by Minh et al. (2015). Includes an option for double-DQN (DDQN; van Hasselt et al., 2015)
DQN chooses from one of a number of discrete actions by taking the maximum Q-value from the value function with one output neuron per available action. DQN uses a replay memory for experience playback.
-
initialize_model
()¶
-
tensorforce.agents.dqn_nstep_agent module¶
-
class
tensorforce.agents.dqn_nstep_agent.
DQNNstepAgent
(states_spec, actions_spec, batched_observe=1000, scope='dqn-nstep', summary_spec=None, network_spec=None, device=None, session_config=None, saver_spec=None, distributed_spec=None, optimizer=None, discount=0.99, variable_noise=None, states_preprocessing_spec=None, explorations_spec=None, reward_preprocessing_spec=None, distributions_spec=None, entropy_regularization=None, batch_size=32, keep_last_timestep=True, target_sync_frequency=10000, target_update_weight=1.0, double_q_model=False, huber_loss=None)¶ Bases:
tensorforce.agents.batch_agent.BatchAgent
N-step Deep-Q-Network agent (DQN).
-
initialize_model
()¶
-
tensorforce.agents.memory_agent module¶
-
class
tensorforce.agents.memory_agent.
MemoryAgent
(states_spec, actions_spec, batched_observe=1000, scope='memory_agent', summary_spec=None, network_spec=None, discount=0.99, device=None, session_config=None, saver_spec=None, distributed_spec=None, optimizer=None, variable_noise=None, states_preprocessing_spec=None, explorations_spec=None, reward_preprocessing_spec=None, distributions_spec=None, entropy_regularization=None, batch_size=1000, memory=None, first_update=10000, update_frequency=4, repeat_update=1)¶ Bases:
tensorforce.agents.learning_agent.LearningAgent
The
MemoryAgent
class implements a replay memory from which it samples batches according to some sampling strategy to update the value function.-
import_observations
(observations)¶ Load an iterable of observation dicts into the replay memory.
Parameters: observations – An iterable with each element containing an observation. Each observation requires keys ‘state’,’action’,’reward’,’terminal’, ‘internal’. Use an empty list [] for ‘internal’ if internal state is irrelevant.
-
observe
(terminal, reward)¶
-
tensorforce.agents.naf_agent module¶
-
class
tensorforce.agents.naf_agent.
NAFAgent
(states_spec, actions_spec, batched_observe=1000, scope='naf', summary_spec=None, network_spec=None, device=None, session_config=None, saver_spec=None, distributed_spec=None, optimizer=None, discount=0.99, variable_noise=None, states_preprocessing_spec=None, explorations_spec=None, reward_preprocessing_spec=None, distributions_spec=None, entropy_regularization=None, batch_size=32, memory=None, first_update=10000, update_frequency=4, repeat_update=1, target_sync_frequency=10000, target_update_weight=1.0, double_q_model=False, huber_loss=None)¶ Bases:
tensorforce.agents.memory_agent.MemoryAgent
Normalized Advantage Functions (NAF) for continuous DQN: https://arxiv.org/abs/1603.00748
-
initialize_model
()¶
-
tensorforce.agents.ppo_agent module¶
-
class
tensorforce.agents.ppo_agent.
PPOAgent
(states_spec, actions_spec, batched_observe=1000, scope='ppo', summary_spec=None, network_spec=None, device=None, session_config=None, saver_spec=None, distributed_spec=None, discount=0.99, variable_noise=None, states_preprocessing_spec=None, explorations_spec=None, reward_preprocessing_spec=None, distributions_spec=None, entropy_regularization=0.01, batch_size=1000, keep_last_timestep=True, baseline_mode=None, baseline=None, baseline_optimizer=None, gae_lambda=None, likelihood_ratio_clipping=None, step_optimizer=None, optimization_steps=10)¶ Bases:
tensorforce.agents.batch_agent.BatchAgent
Proximal Policy Optimization agent ([Schulman et al., 2017] (https://openai-public.s3-us-west-2.amazonaws.com/blog/2017-07/ppo/ppo-arxiv.pdf).
-
initialize_model
()¶
-
tensorforce.agents.random_agent module¶
-
class
tensorforce.agents.random_agent.
RandomAgent
(states_spec, actions_spec, batched_observe=1000, scope='random')¶ Bases:
tensorforce.agents.agent.Agent
Random agent, useful as a baseline and sanity check.
-
initialize_model
()¶
-
tensorforce.agents.trpo_agent module¶
-
class
tensorforce.agents.trpo_agent.
TRPOAgent
(states_spec, actions_spec, batched_observe=1000, scope='trpo', summary_spec=None, network_spec=None, device=None, session_config=None, saver_spec=None, distributed_spec=None, discount=0.99, variable_noise=None, states_preprocessing_spec=None, explorations_spec=None, reward_preprocessing_spec=None, distributions_spec=None, entropy_regularization=None, batch_size=1000, keep_last_timestep=True, baseline_mode=None, baseline=None, baseline_optimizer=None, gae_lambda=None, likelihood_ratio_clipping=None, learning_rate=0.001, cg_max_iterations=20, cg_damping=0.001, cg_unroll_loop=False)¶ Bases:
tensorforce.agents.batch_agent.BatchAgent
Trust Region Policy Optimization (Schulman et al., 2015) agent.
-
initialize_model
()¶
-
tensorforce.agents.vpg_agent module¶
-
class
tensorforce.agents.vpg_agent.
VPGAgent
(states_spec, actions_spec, batched_observe=1000, scope='vpg', summary_spec=None, network_spec=None, device=None, session_config=None, saver_spec=None, distributed_spec=None, optimizer=None, discount=0.99, variable_noise=None, states_preprocessing_spec=None, explorations_spec=None, reward_preprocessing_spec=None, distributions_spec=None, entropy_regularization=None, batch_size=1000, keep_last_timestep=True, baseline_mode=None, baseline=None, baseline_optimizer=None, gae_lambda=None)¶ Bases:
tensorforce.agents.batch_agent.BatchAgent
Vanilla Policy Gradient agent as described by [Sutton et al. (1999)] (https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf).
-
initialize_model
()¶
-
Module contents¶
-
class
tensorforce.agents.
Agent
(states_spec, actions_spec, batched_observe=1000, scope='base_agent')¶ Bases:
object
Basic Reinforcement learning agent. An agent encapsulates execution logic of a particular reinforcement learning algorithm and defines the external interface to the environment.
The agent hence acts as an intermediate layer between environment and backend execution (value function or policy updates).
-
act
(states, deterministic=False)¶ Return action(s) for given state(s). States preprocessing and exploration are applied if configured accordingly.
Parameters: - states (any) – One state (usually a value tuple) or dict of states if multiple states are expected.
- deterministic (bool) – If true, no exploration and sampling is applied.
Returns: Scalar value of the action or dict of multiple actions the agent wants to execute.
-
close
()¶
-
static
from_spec
(spec, kwargs)¶ Creates an agent from a specification dict.
-
initialize_model
()¶ Creates the model for the respective agent based on specifications given by user. This is a separate call after constructing the agent because the agent constructor has to perform a number of checks on the specs first, sometimes adjusting them e.g. by converting to a dict.
-
last_observation
()¶
-
observe
(terminal, reward)¶ Observe experience from the environment to learn from. Optionally pre-processes rewards Child classes should call super to get the processed reward EX: terminal, reward = super()…
Parameters: - terminal (bool) – boolean indicating if the episode terminated after the observation.
- reward (float) – scalar reward that resulted from executing the action.
-
static
process_action_spec
(actions_spec)¶
-
static
process_state_spec
(states_spec)¶
-
reset
()¶ Reset the agent to its initial state on episode start. Updates internal episode and timestep counter, internal states, and resets preprocessors.
-
restore_model
(directory=None, file=None)¶ Restore TensorFlow model. If no checkpoint file is given, the latest checkpoint is restored. If no checkpoint directory is given, the model’s default saver directory is used (unless file specifies the entire path).
Parameters: - directory – Optional checkpoint directory.
- file – Optional checkpoint file, or path if directory not given.
-
save_model
(directory=None, append_timestep=True)¶ Save TensorFlow model. If no checkpoint directory is given, the model’s default saver directory is used. Optionally appends current timestep to prevent overwriting previous checkpoint files. Turn off to be able to load model from the same given path argument as given here.
Parameters: - directory (str) – Optional checkpoint directory.
- append_timestep (bool) – Appends the current timestep to the checkpoint file if true. If this is set to True, the load path must include the checkpoint timestep suffix. For example, if stored to models/ and set to true, the exported file will be of the form models/model.ckpt-X where X is the last timestep saved. The load path must precisely match this file name. If this option is turned off, the checkpoint will always overwrite the file specified in path and the model can always be loaded under this path.
Returns: Checkpoint path were the model was saved.
-
should_stop
()¶
-
-
class
tensorforce.agents.
ConstantAgent
(states_spec, actions_spec, batched_observe=1000, scope='constant', action_values=None)¶ Bases:
tensorforce.agents.agent.Agent
Constant action agent for sanity checks. Returns a constant value at every step, useful to debug continuous problems.
-
initialize_model
()¶
-
-
class
tensorforce.agents.
RandomAgent
(states_spec, actions_spec, batched_observe=1000, scope='random')¶ Bases:
tensorforce.agents.agent.Agent
Random agent, useful as a baseline and sanity check.
-
initialize_model
()¶
-
-
class
tensorforce.agents.
LearningAgent
(states_spec, actions_spec, batched_observe=1000, scope='dqn', summary_spec=None, network_spec=None, discount=0.99, device=None, session_config=None, saver_spec=None, distributed_spec=None, optimizer=None, variable_noise=None, states_preprocessing_spec=None, explorations_spec=None, reward_preprocessing_spec=None, distributions_spec=None, entropy_regularization=None)¶ Bases:
tensorforce.agents.agent.Agent
An Agent that actually learns by optimizing the parameters of its tensorflow model.
-
class
tensorforce.agents.
BatchAgent
(states_spec, actions_spec, batched_observe=1000, summary_spec=None, network_spec=None, discount=0.99, device=None, session_config=None, scope='batch_agent', saver_spec=None, distributed_spec=None, optimizer=None, variable_noise=None, states_preprocessing_spec=None, explorations_spec=None, reward_preprocessing_spec=None, distributions_spec=None, entropy_regularization=None, batch_size=1000, keep_last_timestep=True)¶ Bases:
tensorforce.agents.learning_agent.LearningAgent
The
BatchAgent
class implements a batch memory which generally implies on-policy experience collection and updates.-
observe
(terminal, reward)¶ Adds an observation and performs an update if the necessary conditions are satisfied, i.e. if one batch of experience has been collected as defined by the batch size.
In particular, note that episode control happens outside of the agent since the agent should be agnostic to how the training data is created.
Parameters: - terminal (bool) – Whether episode is terminated or not.
- reward (float) – The scalar reward value.
-
reset_batch
()¶ Cleans up after a batch has been processed (observed). Resets all batch information to be ready for new observation data. Batch information contains:
- observed states
- internal-variables
- taken actions
- observed is-terminal signals/rewards
- total batch size
-
-
class
tensorforce.agents.
MemoryAgent
(states_spec, actions_spec, batched_observe=1000, scope='memory_agent', summary_spec=None, network_spec=None, discount=0.99, device=None, session_config=None, saver_spec=None, distributed_spec=None, optimizer=None, variable_noise=None, states_preprocessing_spec=None, explorations_spec=None, reward_preprocessing_spec=None, distributions_spec=None, entropy_regularization=None, batch_size=1000, memory=None, first_update=10000, update_frequency=4, repeat_update=1)¶ Bases:
tensorforce.agents.learning_agent.LearningAgent
The
MemoryAgent
class implements a replay memory from which it samples batches according to some sampling strategy to update the value function.-
import_observations
(observations)¶ Load an iterable of observation dicts into the replay memory.
Parameters: observations – An iterable with each element containing an observation. Each observation requires keys ‘state’,’action’,’reward’,’terminal’, ‘internal’. Use an empty list [] for ‘internal’ if internal state is irrelevant.
-
observe
(terminal, reward)¶
-
-
class
tensorforce.agents.
VPGAgent
(states_spec, actions_spec, batched_observe=1000, scope='vpg', summary_spec=None, network_spec=None, device=None, session_config=None, saver_spec=None, distributed_spec=None, optimizer=None, discount=0.99, variable_noise=None, states_preprocessing_spec=None, explorations_spec=None, reward_preprocessing_spec=None, distributions_spec=None, entropy_regularization=None, batch_size=1000, keep_last_timestep=True, baseline_mode=None, baseline=None, baseline_optimizer=None, gae_lambda=None)¶ Bases:
tensorforce.agents.batch_agent.BatchAgent
Vanilla Policy Gradient agent as described by [Sutton et al. (1999)] (https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf).
-
initialize_model
()¶
-
-
class
tensorforce.agents.
TRPOAgent
(states_spec, actions_spec, batched_observe=1000, scope='trpo', summary_spec=None, network_spec=None, device=None, session_config=None, saver_spec=None, distributed_spec=None, discount=0.99, variable_noise=None, states_preprocessing_spec=None, explorations_spec=None, reward_preprocessing_spec=None, distributions_spec=None, entropy_regularization=None, batch_size=1000, keep_last_timestep=True, baseline_mode=None, baseline=None, baseline_optimizer=None, gae_lambda=None, likelihood_ratio_clipping=None, learning_rate=0.001, cg_max_iterations=20, cg_damping=0.001, cg_unroll_loop=False)¶ Bases:
tensorforce.agents.batch_agent.BatchAgent
Trust Region Policy Optimization (Schulman et al., 2015) agent.
-
initialize_model
()¶
-
-
class
tensorforce.agents.
PPOAgent
(states_spec, actions_spec, batched_observe=1000, scope='ppo', summary_spec=None, network_spec=None, device=None, session_config=None, saver_spec=None, distributed_spec=None, discount=0.99, variable_noise=None, states_preprocessing_spec=None, explorations_spec=None, reward_preprocessing_spec=None, distributions_spec=None, entropy_regularization=0.01, batch_size=1000, keep_last_timestep=True, baseline_mode=None, baseline=None, baseline_optimizer=None, gae_lambda=None, likelihood_ratio_clipping=None, step_optimizer=None, optimization_steps=10)¶ Bases:
tensorforce.agents.batch_agent.BatchAgent
Proximal Policy Optimization agent ([Schulman et al., 2017] (https://openai-public.s3-us-west-2.amazonaws.com/blog/2017-07/ppo/ppo-arxiv.pdf).
-
initialize_model
()¶
-
-
class
tensorforce.agents.
DQNAgent
(states_spec, actions_spec, batched_observe=None, scope='dqn', summary_spec=None, network_spec=None, device=None, session_config=None, saver_spec=None, distributed_spec=None, optimizer=None, discount=0.99, variable_noise=None, states_preprocessing_spec=None, explorations_spec=None, reward_preprocessing_spec=None, distributions_spec=None, entropy_regularization=None, batch_size=32, memory=None, first_update=10000, update_frequency=4, repeat_update=1, target_sync_frequency=10000, target_update_weight=1.0, double_q_model=False, huber_loss=None)¶ Bases:
tensorforce.agents.memory_agent.MemoryAgent
Deep-Q-Network agent (DQN). The piece de resistance of deep reinforcement learning as described by Minh et al. (2015). Includes an option for double-DQN (DDQN; van Hasselt et al., 2015)
DQN chooses from one of a number of discrete actions by taking the maximum Q-value from the value function with one output neuron per available action. DQN uses a replay memory for experience playback.
-
initialize_model
()¶
-
-
class
tensorforce.agents.
DDQNAgent
(states_spec, actions_spec, batched_observe=1000, scope='ddqn', summary_spec=None, network_spec=None, device=None, session_config=None, saver_spec=None, distributed_spec=None, optimizer=None, discount=0.99, variable_noise=None, states_preprocessing_spec=None, explorations_spec=None, reward_preprocessing_spec=None, distributions_spec=None, entropy_regularization=None, batch_size=32, memory=None, first_update=10000, update_frequency=4, repeat_update=1, target_sync_frequency=10000, target_update_weight=1.0, huber_loss=None)¶ Bases:
tensorforce.agents.memory_agent.MemoryAgent
Double DQN Agent based on Van Hasselt et al.. Simple extension to DQN which improves stability.
-
initialize_model
()¶
-
-
class
tensorforce.agents.
DQNNstepAgent
(states_spec, actions_spec, batched_observe=1000, scope='dqn-nstep', summary_spec=None, network_spec=None, device=None, session_config=None, saver_spec=None, distributed_spec=None, optimizer=None, discount=0.99, variable_noise=None, states_preprocessing_spec=None, explorations_spec=None, reward_preprocessing_spec=None, distributions_spec=None, entropy_regularization=None, batch_size=32, keep_last_timestep=True, target_sync_frequency=10000, target_update_weight=1.0, double_q_model=False, huber_loss=None)¶ Bases:
tensorforce.agents.batch_agent.BatchAgent
N-step Deep-Q-Network agent (DQN).
-
initialize_model
()¶
-
-
class
tensorforce.agents.
DQFDAgent
(states_spec, actions_spec, batched_observe=1000, scope='dqfd', summary_spec=None, network_spec=None, device=None, session_config=None, saver_spec=None, distributed_spec=None, optimizer=None, discount=0.99, variable_noise=None, states_preprocessing_spec=None, explorations_spec=None, reward_preprocessing_spec=None, distributions_spec=None, entropy_regularization=None, batch_size=32, memory=None, first_update=10000, update_frequency=4, repeat_update=1, target_sync_frequency=10000, target_update_weight=1.0, huber_loss=None, expert_margin=0.5, supervised_weight=0.1, demo_memory_capacity=10000, demo_sampling_ratio=0.2)¶ Bases:
tensorforce.agents.memory_agent.MemoryAgent
Deep Q-learning from demonstration (DQFD) agent (Hester et al., 2017). This agent uses DQN to pre-train from demonstration data via an additional supervised loss term.
-
import_demonstrations
(demonstrations)¶ Imports demonstrations, i.e. expert observations. Note that for large numbers of observations, set_demonstrations is more appropriate, which directly sets memory contents to an array an expects a different layout.
Parameters: demonstrations – List of observation dicts
-
initialize_model
()¶
-
observe
(reward, terminal)¶ Adds observations, updates via sampling from memories according to update rate. DQFD samples from the online replay memory and the demo memory with the fractions controlled by a hyper parameter p called ‘expert sampling ratio.
-
pretrain
(steps)¶ Computes pre-train updates.
Parameters: steps – Number of updates to execute.
-
set_demonstrations
(batch)¶ Set all demonstrations from batch data. Expects a dict wherein each value contains an array containing all states, actions, rewards, terminals and internals respectively.
Parameters: batch –
-
-
class
tensorforce.agents.
NAFAgent
(states_spec, actions_spec, batched_observe=1000, scope='naf', summary_spec=None, network_spec=None, device=None, session_config=None, saver_spec=None, distributed_spec=None, optimizer=None, discount=0.99, variable_noise=None, states_preprocessing_spec=None, explorations_spec=None, reward_preprocessing_spec=None, distributions_spec=None, entropy_regularization=None, batch_size=32, memory=None, first_update=10000, update_frequency=4, repeat_update=1, target_sync_frequency=10000, target_update_weight=1.0, double_q_model=False, huber_loss=None)¶ Bases:
tensorforce.agents.memory_agent.MemoryAgent
Normalized Advantage Functions (NAF) for continuous DQN: https://arxiv.org/abs/1603.00748
-
initialize_model
()¶
-