tensorforce.agents package¶

Submodules¶

tensorforce.agents.agent module¶

class tensorforce.agents.agent.Agent(states, actions, batched_observe=True, batching_capacity=1000)¶

Bases: object

Base class for TensorForce agents.

__init__(states, actions, batched_observe=True, batching_capacity=1000)¶

Initializes the agent.

Parameters:	states – States specification, with the following attributes (required):

Parameters:	actions – Actions specification, with the following attributes (required):

Parameters:	batched_observe (bool) – Specifies whether calls to model.observe() are batched, for improved performance (default: true). batching_capacity (int) – Batching capacity of agent and model (default: 1000).

act(states, deterministic=False, independent=False, fetch_tensors=None)¶

Return action(s) for given state(s). States preprocessing and exploration are applied if configured accordingly.

Parameters:	states (any) – One state (usually a value tuple) or dict of states if multiple states are expected. deterministic (bool) – If true, no exploration and sampling is applied. independent (bool) – If true, action is not followed by observe (and hence not included in updates). fetch_tensors (list) – Optional String of named tensors to fetch
Returns:	Scalar value of the action or dict of multiple actions the agent wants to execute. (fetched_tensors) Optional dict() with named tensors fetched

close()¶

static from_spec(spec, kwargs)¶: Creates an agent from a specification dict.

initialize_model()¶: Creates the model for the respective agent based on specifications given by user. This is a separate call after constructing the agent because the agent constructor has to perform a number of checks on the specs first, sometimes adjusting them e.g. by converting to a dict.

last_observation()¶

observe(terminal, reward)¶

Observe experience from the environment to learn from. Optionally pre-processes rewards Child classes should call super to get the processed reward EX: terminal, reward = super()…

Parameters:	terminal (bool) – boolean indicating if the episode terminated after the observation. reward (float) – scalar reward that resulted from executing the action.

reset()¶: Reset the agent to its initial state (e.g. on experiment start). Updates the Model’s internal episode and time step counter, internal states, and resets preprocessors.

restore_model(directory=None, file=None)¶

Restore TensorFlow model. If no checkpoint file is given, the latest checkpoint is restored. If no checkpoint directory is given, the model’s default saver directory is used (unless file specifies the entire path).

Parameters:	directory – Optional checkpoint directory. file – Optional checkpoint file, or path if directory not given.

save_model(directory=None, append_timestep=True)¶

Save TensorFlow model. If no checkpoint directory is given, the model’s default saver directory is used. Optionally appends current timestep to prevent overwriting previous checkpoint files. Turn off to be able to load model from the same given path argument as given here.

Parameters:

directory (str) – Optional checkpoint directory.
append_timestep (bool) – Appends the current timestep to the checkpoint file if true. If this is set to True, the load path must include the checkpoint timestep suffix. For example, if stored to models/ and set to true, the exported file will be of the form models/model.ckpt-X where X is the last timestep saved. The load path must precisely match this file name. If this option is turned off, the checkpoint will always overwrite the file specified in path and the model can always be loaded under this path.

Returns:

Checkpoint path were the model was saved.

set_normalized_actions(actions)¶

set_normalized_states(states)¶

should_stop()¶

tensorforce.agents.batch_agent module¶

tensorforce.agents.constant_agent module¶

class tensorforce.agents.constant_agent.ConstantAgent(states, actions, action_values, batched_observe=True, batching_capacity=1000, scope='constant', device=None, saver=None, summarizer=None, distributed=None)¶

Bases: tensorforce.agents.agent.Agent

Agent returning constant action values.

__init__(states, actions, action_values, batched_observe=True, batching_capacity=1000, scope='constant', device=None, saver=None, summarizer=None, distributed=None)¶

Initializes the constant agent.

Parameters:	action_values (value, or dict of values) – Action values returned by the agent (required). scope (str) – TensorFlow scope (default: name of agent). device – TensorFlow device (default: none) saver – Saver specification, with the following attributes (default: none):

Parameters:	summarizer – Summarizer specification, with the following attributes (default: none):

Parameters:	distributed – Distributed specification, with the following attributes (default: none):

act(states, deterministic=False, independent=False, fetch_tensors=None)¶

Return action(s) for given state(s). States preprocessing and exploration are applied if configured accordingly.

Parameters:	states (any) – One state (usually a value tuple) or dict of states if multiple states are expected. deterministic (bool) – If true, no exploration and sampling is applied. independent (bool) – If true, action is not followed by observe (and hence not included in updates). fetch_tensors (list) – Optional String of named tensors to fetch
Returns:	Scalar value of the action or dict of multiple actions the agent wants to execute. (fetched_tensors) Optional dict() with named tensors fetched

close()¶

from_spec(spec, kwargs)¶: Creates an agent from a specification dict.

initialize_model()¶

last_observation()¶

observe(terminal, reward)¶

Observe experience from the environment to learn from. Optionally pre-processes rewards Child classes should call super to get the processed reward EX: terminal, reward = super()…

Parameters:	terminal (bool) – boolean indicating if the episode terminated after the observation. reward (float) – scalar reward that resulted from executing the action.

reset()¶: Reset the agent to its initial state (e.g. on experiment start). Updates the Model’s internal episode and time step counter, internal states, and resets preprocessors.

restore_model(directory=None, file=None)¶

Restore TensorFlow model. If no checkpoint file is given, the latest checkpoint is restored. If no checkpoint directory is given, the model’s default saver directory is used (unless file specifies the entire path).

Parameters:	directory – Optional checkpoint directory. file – Optional checkpoint file, or path if directory not given.

save_model(directory=None, append_timestep=True)¶

Save TensorFlow model. If no checkpoint directory is given, the model’s default saver directory is used. Optionally appends current timestep to prevent overwriting previous checkpoint files. Turn off to be able to load model from the same given path argument as given here.

Parameters:

directory (str) – Optional checkpoint directory.
append_timestep (bool) – Appends the current timestep to the checkpoint file if true. If this is set to True, the load path must include the checkpoint timestep suffix. For example, if stored to models/ and set to true, the exported file will be of the form models/model.ckpt-X where X is the last timestep saved. The load path must precisely match this file name. If this option is turned off, the checkpoint will always overwrite the file specified in path and the model can always be loaded under this path.

Returns:

Checkpoint path were the model was saved.

set_normalized_actions(actions)¶

set_normalized_states(states)¶

should_stop()¶

tensorforce.agents.ddqn_agent module¶

tensorforce.agents.dqfd_agent module¶

class tensorforce.agents.dqfd_agent.DQFDAgent(states, actions, network, batched_observe=True, batching_capacity=1000, scope='dqfd', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, update_mode=None, memory=None, optimizer=None, discount=0.99, distributions=None, entropy_regularization=None, target_sync_frequency=10000, target_update_weight=1.0, huber_loss=None, expert_margin=0.5, supervised_weight=0.1, demo_memory_capacity=10000, demo_sampling_ratio=0.2)¶

Bases: tensorforce.agents.learning_agent.LearningAgent

Deep Q-learning from demonstration agent (Hester et al., 2017).

__init__(states, actions, network, batched_observe=True, batching_capacity=1000, scope='dqfd', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, update_mode=None, memory=None, optimizer=None, discount=0.99, distributions=None, entropy_regularization=None, target_sync_frequency=10000, target_update_weight=1.0, huber_loss=None, expert_margin=0.5, supervised_weight=0.1, demo_memory_capacity=10000, demo_sampling_ratio=0.2)¶

Initializes the DQFD agent.

Parameters:	update_mode – Update mode specification, with the following attributes:

Parameters:

memory (spec) – Memory specification, see core.memories module for more information (default: {type=’replay’, include_next_states=true, capacity=1000*batch_size}).
optimizer (spec) – Optimizer specification, see core.optimizers module for more information (default: {type=’adam’, learning_rate=1e-3}).
target_sync_frequency (int) – Target network sync frequency (default: 10000).
target_update_weight (float) – Target network update weight (default: 1.0).
huber_loss (float) – Huber loss clipping (default: none).
expert_margin (float) – Enforced supervised margin between expert action Q-value and other Q-values (default: 0.5).
supervised_weight (float) – Weight of supervised loss term (default: 0.1).
demo_memory_capacity (int) – Capacity of expert demonstration memory (default: 10000).
demo_sampling_ratio (float) – Runtime sampling ratio of expert data (default: 0.2).

act(states, deterministic=False, independent=False, fetch_tensors=None)¶

Return action(s) for given state(s). States preprocessing and exploration are applied if configured accordingly.

Parameters:	states (any) – One state (usually a value tuple) or dict of states if multiple states are expected. deterministic (bool) – If true, no exploration and sampling is applied. independent (bool) – If true, action is not followed by observe (and hence not included in updates). fetch_tensors (list) – Optional String of named tensors to fetch
Returns:	Scalar value of the action or dict of multiple actions the agent wants to execute. (fetched_tensors) Optional dict() with named tensors fetched

close()¶

from_spec(spec, kwargs)¶: Creates an agent from a specification dict.

import_demonstrations(demonstrations)¶

Imports demonstrations, i.e. expert observations. Note that for large numbers of observations, set_demonstrations is more appropriate, which directly sets memory contents to an array an expects a different layout.

Parameters:	demonstrations – List of observation dicts

import_experience(experiences)¶

Imports experiences.

Parameters:	experiences –

initialize_model()¶

last_observation()¶

observe(terminal, reward)¶

Observe experience from the environment to learn from. Optionally pre-processes rewards Child classes should call super to get the processed reward EX: terminal, reward = super()…

Parameters:	terminal (bool) – boolean indicating if the episode terminated after the observation. reward (float) – scalar reward that resulted from executing the action.

pretrain(steps)¶

Computes pre-train updates.

Parameters:	steps – Number of updates to execute.

reset()¶: Reset the agent to its initial state (e.g. on experiment start). Updates the Model’s internal episode and time step counter, internal states, and resets preprocessors.

restore_model(directory=None, file=None)¶

Restore TensorFlow model. If no checkpoint file is given, the latest checkpoint is restored. If no checkpoint directory is given, the model’s default saver directory is used (unless file specifies the entire path).

Parameters:	directory – Optional checkpoint directory. file – Optional checkpoint file, or path if directory not given.

save_model(directory=None, append_timestep=True)¶

Save TensorFlow model. If no checkpoint directory is given, the model’s default saver directory is used. Optionally appends current timestep to prevent overwriting previous checkpoint files. Turn off to be able to load model from the same given path argument as given here.

Parameters:

directory (str) – Optional checkpoint directory.
append_timestep (bool) – Appends the current timestep to the checkpoint file if true. If this is set to True, the load path must include the checkpoint timestep suffix. For example, if stored to models/ and set to true, the exported file will be of the form models/model.ckpt-X where X is the last timestep saved. The load path must precisely match this file name. If this option is turned off, the checkpoint will always overwrite the file specified in path and the model can always be loaded under this path.

Returns:

Checkpoint path were the model was saved.

set_normalized_actions(actions)¶

set_normalized_states(states)¶

should_stop()¶

tensorforce.agents.dqn_agent module¶

class tensorforce.agents.dqn_agent.DQNAgent(states, actions, network, batched_observe=True, batching_capacity=1000, scope='dqn', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, update_mode=None, memory=None, optimizer=None, discount=0.99, distributions=None, entropy_regularization=None, target_sync_frequency=10000, target_update_weight=1.0, double_q_model=False, huber_loss=None)¶

Bases: tensorforce.agents.learning_agent.LearningAgent

Deep Q-Network agent (Mnih et al., 2015).

__init__(states, actions, network, batched_observe=True, batching_capacity=1000, scope='dqn', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, update_mode=None, memory=None, optimizer=None, discount=0.99, distributions=None, entropy_regularization=None, target_sync_frequency=10000, target_update_weight=1.0, double_q_model=False, huber_loss=None)¶

Initializes the DQN agent.

Parameters:	update_mode – Update mode specification, with the following attributes:

Parameters:

memory (spec) – Memory specification, see core.memories module for more information (default: {type=’replay’, include_next_states=true, capacity=1000*batch_size}).
optimizer (spec) – Optimizer specification, see core.optimizers module for more information (default: {type=’adam’, learning_rate=1e-3}).
target_sync_frequency (int) – Target network sync frequency (default: 10000).
target_update_weight (float) – Target network update weight (default: 1.0).
double_q_model (bool) – Specifies whether double DQN mode is used (default: false).
huber_loss (float) – Huber loss clipping (default: none).

act(states, deterministic=False, independent=False, fetch_tensors=None)¶

Return action(s) for given state(s). States preprocessing and exploration are applied if configured accordingly.

Parameters:	states (any) – One state (usually a value tuple) or dict of states if multiple states are expected. deterministic (bool) – If true, no exploration and sampling is applied. independent (bool) – If true, action is not followed by observe (and hence not included in updates). fetch_tensors (list) – Optional String of named tensors to fetch
Returns:	Scalar value of the action or dict of multiple actions the agent wants to execute. (fetched_tensors) Optional dict() with named tensors fetched

close()¶

from_spec(spec, kwargs)¶: Creates an agent from a specification dict.

import_experience(experiences)¶

Imports experiences.

Parameters:	experiences –

initialize_model()¶

last_observation()¶

observe(terminal, reward)¶

Observe experience from the environment to learn from. Optionally pre-processes rewards Child classes should call super to get the processed reward EX: terminal, reward = super()…

Parameters:	terminal (bool) – boolean indicating if the episode terminated after the observation. reward (float) – scalar reward that resulted from executing the action.

reset()¶: Reset the agent to its initial state (e.g. on experiment start). Updates the Model’s internal episode and time step counter, internal states, and resets preprocessors.

restore_model(directory=None, file=None)¶

Restore TensorFlow model. If no checkpoint file is given, the latest checkpoint is restored. If no checkpoint directory is given, the model’s default saver directory is used (unless file specifies the entire path).

Parameters:	directory – Optional checkpoint directory. file – Optional checkpoint file, or path if directory not given.

save_model(directory=None, append_timestep=True)¶

Save TensorFlow model. If no checkpoint directory is given, the model’s default saver directory is used. Optionally appends current timestep to prevent overwriting previous checkpoint files. Turn off to be able to load model from the same given path argument as given here.

Parameters:

directory (str) – Optional checkpoint directory.
append_timestep (bool) – Appends the current timestep to the checkpoint file if true. If this is set to True, the load path must include the checkpoint timestep suffix. For example, if stored to models/ and set to true, the exported file will be of the form models/model.ckpt-X where X is the last timestep saved. The load path must precisely match this file name. If this option is turned off, the checkpoint will always overwrite the file specified in path and the model can always be loaded under this path.

Returns:

Checkpoint path were the model was saved.

set_normalized_actions(actions)¶

set_normalized_states(states)¶

should_stop()¶

tensorforce.agents.dqn_nstep_agent module¶

class tensorforce.agents.dqn_nstep_agent.DQNNstepAgent(states, actions, network, batched_observe=True, batching_capacity=1000, scope='dqn-nstep', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, update_mode=None, memory=None, optimizer=None, discount=0.99, distributions=None, entropy_regularization=None, target_sync_frequency=10000, target_update_weight=1.0, double_q_model=False, huber_loss=None)¶

Bases: tensorforce.agents.learning_agent.LearningAgent

DQN n-step agent.

__init__(states, actions, network, batched_observe=True, batching_capacity=1000, scope='dqn-nstep', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, update_mode=None, memory=None, optimizer=None, discount=0.99, distributions=None, entropy_regularization=None, target_sync_frequency=10000, target_update_weight=1.0, double_q_model=False, huber_loss=None)¶

Initializes the DQN n-step agent.

Parameters:	update_mode – Update mode specification, with the following attributes:

Parameters:

memory (spec) – Memory specification, see core.memories module for more information (default: {type=’latest’, include_next_states=true, capacity=1000*batch_size}).
optimizer (spec) – Optimizer specification, see core.optimizers module for more information (default: {type=’adam’, learning_rate=1e-3}).
target_sync_frequency (int) – Target network sync frequency (default: 10000).
target_update_weight (float) – Target network update weight (default: 1.0).
double_q_model (bool) – Specifies whether double DQN mode is used (default: false).
huber_loss (float) – Huber loss clipping (default: none).

act(states, deterministic=False, independent=False, fetch_tensors=None)¶

Return action(s) for given state(s). States preprocessing and exploration are applied if configured accordingly.

Parameters:	states (any) – One state (usually a value tuple) or dict of states if multiple states are expected. deterministic (bool) – If true, no exploration and sampling is applied. independent (bool) – If true, action is not followed by observe (and hence not included in updates). fetch_tensors (list) – Optional String of named tensors to fetch
Returns:	Scalar value of the action or dict of multiple actions the agent wants to execute. (fetched_tensors) Optional dict() with named tensors fetched

close()¶

from_spec(spec, kwargs)¶: Creates an agent from a specification dict.

import_experience(experiences)¶

Imports experiences.

Parameters:	experiences –

initialize_model()¶

last_observation()¶

observe(terminal, reward)¶

Observe experience from the environment to learn from. Optionally pre-processes rewards Child classes should call super to get the processed reward EX: terminal, reward = super()…

Parameters:	terminal (bool) – boolean indicating if the episode terminated after the observation. reward (float) – scalar reward that resulted from executing the action.

reset()¶: Reset the agent to its initial state (e.g. on experiment start). Updates the Model’s internal episode and time step counter, internal states, and resets preprocessors.

restore_model(directory=None, file=None)¶

Restore TensorFlow model. If no checkpoint file is given, the latest checkpoint is restored. If no checkpoint directory is given, the model’s default saver directory is used (unless file specifies the entire path).

Parameters:	directory – Optional checkpoint directory. file – Optional checkpoint file, or path if directory not given.

save_model(directory=None, append_timestep=True)¶

Save TensorFlow model. If no checkpoint directory is given, the model’s default saver directory is used. Optionally appends current timestep to prevent overwriting previous checkpoint files. Turn off to be able to load model from the same given path argument as given here.

Parameters:

directory (str) – Optional checkpoint directory.
append_timestep (bool) – Appends the current timestep to the checkpoint file if true. If this is set to True, the load path must include the checkpoint timestep suffix. For example, if stored to models/ and set to true, the exported file will be of the form models/model.ckpt-X where X is the last timestep saved. The load path must precisely match this file name. If this option is turned off, the checkpoint will always overwrite the file specified in path and the model can always be loaded under this path.

Returns:

Checkpoint path were the model was saved.

set_normalized_actions(actions)¶

set_normalized_states(states)¶

should_stop()¶

tensorforce.agents.learning_agent module¶

class tensorforce.agents.learning_agent.LearningAgent(states, actions, network, update_mode, memory, optimizer, batched_observe=True, batching_capacity=1000, scope='learning-agent', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, discount=0.99, distributions=None, entropy_regularization=None)¶

Bases: tensorforce.agents.agent.Agent

Base class for learning agents, using as model a subclass of MemoryModel and DistributionModel.

__init__(states, actions, network, update_mode, memory, optimizer, batched_observe=True, batching_capacity=1000, scope='learning-agent', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, discount=0.99, distributions=None, entropy_regularization=None)¶

Initializes the learning agent.

Parameters:	update_mode – Update mode specification, with the following attributes (required):

Parameters:

memory (spec) – Memory specification, see core.memories module for more information (required).
optimizer (spec) – Optimizer specification, see core.optimizers module for more information (required).
network (spec) – Network specification, usually a list of layer specifications, see core.networks module for more information (required).
scope (str) – TensorFlow scope (default: name of agent).
device – TensorFlow device (default: none)
saver – Saver specification, with the following attributes (default: none):

Parameters:	summarizer – Summarizer specification, with the following attributes (default: none):

Parameters:	execution – Distributed specification, with the following attributes (default: none):

Parameters:

variable_noise (float) – Standard deviation of variable noise (default: none).
states_preprocessing (spec, or dict of specs) – States preprocessing specification, see core.preprocessors module for more information (default: none)
actions_exploration (spec, or dict of specs) – Actions exploration specification, see core.explorations module for more information (default: none).
reward_preprocessing (spec) – Reward preprocessing specification, see core.preprocessors module for more information (default: none).
discount (float) – Discount factor for future rewards (default: 0.99).
distributions (spec / dict of specs) – Distributions specifications, see core.distributions module for more information (default: none).
entropy_regularization (float) – Entropy regularization weight (default: none).

act(states, deterministic=False, independent=False, fetch_tensors=None)¶

Return action(s) for given state(s). States preprocessing and exploration are applied if configured accordingly.

Parameters:	states (any) – One state (usually a value tuple) or dict of states if multiple states are expected. deterministic (bool) – If true, no exploration and sampling is applied. independent (bool) – If true, action is not followed by observe (and hence not included in updates). fetch_tensors (list) – Optional String of named tensors to fetch
Returns:	Scalar value of the action or dict of multiple actions the agent wants to execute. (fetched_tensors) Optional dict() with named tensors fetched

close()¶

from_spec(spec, kwargs)¶: Creates an agent from a specification dict.

import_experience(experiences)¶

Imports experiences.

Parameters:	experiences –

initialize_model()¶: Creates the model for the respective agent based on specifications given by user. This is a separate call after constructing the agent because the agent constructor has to perform a number of checks on the specs first, sometimes adjusting them e.g. by converting to a dict.

last_observation()¶

observe(terminal, reward)¶

Observe experience from the environment to learn from. Optionally pre-processes rewards Child classes should call super to get the processed reward EX: terminal, reward = super()…

Parameters:	terminal (bool) – boolean indicating if the episode terminated after the observation. reward (float) – scalar reward that resulted from executing the action.

reset()¶: Reset the agent to its initial state (e.g. on experiment start). Updates the Model’s internal episode and time step counter, internal states, and resets preprocessors.

restore_model(directory=None, file=None)¶

Restore TensorFlow model. If no checkpoint file is given, the latest checkpoint is restored. If no checkpoint directory is given, the model’s default saver directory is used (unless file specifies the entire path).

Parameters:	directory – Optional checkpoint directory. file – Optional checkpoint file, or path if directory not given.

save_model(directory=None, append_timestep=True)¶

Save TensorFlow model. If no checkpoint directory is given, the model’s default saver directory is used. Optionally appends current timestep to prevent overwriting previous checkpoint files. Turn off to be able to load model from the same given path argument as given here.

Parameters:

directory (str) – Optional checkpoint directory.
append_timestep (bool) – Appends the current timestep to the checkpoint file if true. If this is set to True, the load path must include the checkpoint timestep suffix. For example, if stored to models/ and set to true, the exported file will be of the form models/model.ckpt-X where X is the last timestep saved. The load path must precisely match this file name. If this option is turned off, the checkpoint will always overwrite the file specified in path and the model can always be loaded under this path.

Returns:

Checkpoint path were the model was saved.

set_normalized_actions(actions)¶

set_normalized_states(states)¶

should_stop()¶

tensorforce.agents.memory_agent module¶

tensorforce.agents.naf_agent module¶

class tensorforce.agents.naf_agent.NAFAgent(states, actions, network, batched_observe=True, batching_capacity=1000, scope='naf', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, update_mode=None, memory=None, optimizer=None, discount=0.99, distributions=None, entropy_regularization=None, target_sync_frequency=10000, target_update_weight=1.0, double_q_model=False, huber_loss=None)¶

Bases: tensorforce.agents.learning_agent.LearningAgent

Normalized Advantage Function agent (Gu et al., 2016).

__init__(states, actions, network, batched_observe=True, batching_capacity=1000, scope='naf', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, update_mode=None, memory=None, optimizer=None, discount=0.99, distributions=None, entropy_regularization=None, target_sync_frequency=10000, target_update_weight=1.0, double_q_model=False, huber_loss=None)¶

Initializes the NAF agent.

Parameters:	update_mode – Update mode specification, with the following attributes:

Parameters:

memory (spec) – Memory specification, see core.memories module for more information (default: {type=’replay’, include_next_states=true, capacity=1000*batch_size}).
optimizer (spec) – Optimizer specification, see core.optimizers module for more information (default: {type=’adam’, learning_rate=1e-3}).
target_sync_frequency (int) – Target network sync frequency (default: 10000).
target_update_weight (float) – Target network update weight (default: 1.0).
double_q_model (bool) – Specifies whether double DQN mode is used (default: false).
huber_loss (float) – Huber loss clipping (default: none).

act(states, deterministic=False, independent=False, fetch_tensors=None)¶

Return action(s) for given state(s). States preprocessing and exploration are applied if configured accordingly.

Parameters:	states (any) – One state (usually a value tuple) or dict of states if multiple states are expected. deterministic (bool) – If true, no exploration and sampling is applied. independent (bool) – If true, action is not followed by observe (and hence not included in updates). fetch_tensors (list) – Optional String of named tensors to fetch
Returns:	Scalar value of the action or dict of multiple actions the agent wants to execute. (fetched_tensors) Optional dict() with named tensors fetched

close()¶

from_spec(spec, kwargs)¶: Creates an agent from a specification dict.

import_experience(experiences)¶

Imports experiences.

Parameters:	experiences –

initialize_model()¶

last_observation()¶

observe(terminal, reward)¶

Observe experience from the environment to learn from. Optionally pre-processes rewards Child classes should call super to get the processed reward EX: terminal, reward = super()…

Parameters:	terminal (bool) – boolean indicating if the episode terminated after the observation. reward (float) – scalar reward that resulted from executing the action.

reset()¶: Reset the agent to its initial state (e.g. on experiment start). Updates the Model’s internal episode and time step counter, internal states, and resets preprocessors.

restore_model(directory=None, file=None)¶

Restore TensorFlow model. If no checkpoint file is given, the latest checkpoint is restored. If no checkpoint directory is given, the model’s default saver directory is used (unless file specifies the entire path).

Parameters:	directory – Optional checkpoint directory. file – Optional checkpoint file, or path if directory not given.

save_model(directory=None, append_timestep=True)¶

Save TensorFlow model. If no checkpoint directory is given, the model’s default saver directory is used. Optionally appends current timestep to prevent overwriting previous checkpoint files. Turn off to be able to load model from the same given path argument as given here.

Parameters:

directory (str) – Optional checkpoint directory.
append_timestep (bool) – Appends the current timestep to the checkpoint file if true. If this is set to True, the load path must include the checkpoint timestep suffix. For example, if stored to models/ and set to true, the exported file will be of the form models/model.ckpt-X where X is the last timestep saved. The load path must precisely match this file name. If this option is turned off, the checkpoint will always overwrite the file specified in path and the model can always be loaded under this path.

Returns:

Checkpoint path were the model was saved.

set_normalized_actions(actions)¶

set_normalized_states(states)¶

should_stop()¶

tensorforce.agents.ppo_agent module¶

class tensorforce.agents.ppo_agent.PPOAgent(states, actions, network, batched_observe=True, batching_capacity=1000, scope='ppo', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, update_mode=None, memory=None, discount=0.99, distributions=None, entropy_regularization=None, baseline_mode=None, baseline=None, baseline_optimizer=None, gae_lambda=None, likelihood_ratio_clipping=0.2, step_optimizer=None, subsampling_fraction=0.1, optimization_steps=50)¶

Bases: tensorforce.agents.learning_agent.LearningAgent

Proximal Policy Optimization agent (Schulman et al., 2017).

__init__(states, actions, network, batched_observe=True, batching_capacity=1000, scope='ppo', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, update_mode=None, memory=None, discount=0.99, distributions=None, entropy_regularization=None, baseline_mode=None, baseline=None, baseline_optimizer=None, gae_lambda=None, likelihood_ratio_clipping=0.2, step_optimizer=None, subsampling_fraction=0.1, optimization_steps=50)¶

Initializes the PPO agent.

Parameters:	update_mode – Update mode specification, with the following attributes:

Parameters:

memory (spec) – Memory specification, see core.memories module for more information (default: {type=’latest’, include_next_states=false, capacity=1000*batch_size}).
optimizer (spec) – PPO agent implicitly defines a multi-step subsampling optimizer.
baseline_mode (str) – One of ‘states’, ‘network’ (default: none).
baseline (spec) – Baseline specification, see core.baselines module for more information (default: none).
baseline_optimizer (spec) – Baseline optimizer specification, see core.optimizers module for more information (default: none).
gae_lambda (float) – Lambda factor for generalized advantage estimation (default: none).
likelihood_ratio_clipping (float) – Likelihood ratio clipping for policy gradient (default: 0.2).
step_optimizer (spec) – Step optimizer specification of implicit multi-step subsampling optimizer, see core.optimizers module for more information (default: {type=’adam’, learning_rate=1e-3}).
subsampling_fraction (float) – Subsampling fraction of implicit subsampling optimizer (default: 0.1).
optimization_steps (int) – Number of optimization steps for implicit multi-step optimizer (default: 50).

act(states, deterministic=False, independent=False, fetch_tensors=None)¶

Return action(s) for given state(s). States preprocessing and exploration are applied if configured accordingly.

Parameters:	states (any) – One state (usually a value tuple) or dict of states if multiple states are expected. deterministic (bool) – If true, no exploration and sampling is applied. independent (bool) – If true, action is not followed by observe (and hence not included in updates). fetch_tensors (list) – Optional String of named tensors to fetch
Returns:	Scalar value of the action or dict of multiple actions the agent wants to execute. (fetched_tensors) Optional dict() with named tensors fetched

close()¶

from_spec(spec, kwargs)¶: Creates an agent from a specification dict.

import_experience(experiences)¶

Imports experiences.

Parameters:	experiences –

initialize_model()¶

last_observation()¶

observe(terminal, reward)¶

Observe experience from the environment to learn from. Optionally pre-processes rewards Child classes should call super to get the processed reward EX: terminal, reward = super()…

Parameters:	terminal (bool) – boolean indicating if the episode terminated after the observation. reward (float) – scalar reward that resulted from executing the action.

reset()¶: Reset the agent to its initial state (e.g. on experiment start). Updates the Model’s internal episode and time step counter, internal states, and resets preprocessors.

restore_model(directory=None, file=None)¶

Restore TensorFlow model. If no checkpoint file is given, the latest checkpoint is restored. If no checkpoint directory is given, the model’s default saver directory is used (unless file specifies the entire path).

Parameters:	directory – Optional checkpoint directory. file – Optional checkpoint file, or path if directory not given.

save_model(directory=None, append_timestep=True)¶

Save TensorFlow model. If no checkpoint directory is given, the model’s default saver directory is used. Optionally appends current timestep to prevent overwriting previous checkpoint files. Turn off to be able to load model from the same given path argument as given here.

Parameters:

directory (str) – Optional checkpoint directory.
append_timestep (bool) – Appends the current timestep to the checkpoint file if true. If this is set to True, the load path must include the checkpoint timestep suffix. For example, if stored to models/ and set to true, the exported file will be of the form models/model.ckpt-X where X is the last timestep saved. The load path must precisely match this file name. If this option is turned off, the checkpoint will always overwrite the file specified in path and the model can always be loaded under this path.

Returns:

Checkpoint path were the model was saved.

set_normalized_actions(actions)¶

set_normalized_states(states)¶

should_stop()¶

tensorforce.agents.random_agent module¶

class tensorforce.agents.random_agent.RandomAgent(states, actions, batched_observe=True, batching_capacity=1000, scope='random', device=None, saver=None, summarizer=None, distributed=None)¶

Bases: tensorforce.agents.agent.Agent

Agent returning random action values.

__init__(states, actions, batched_observe=True, batching_capacity=1000, scope='random', device=None, saver=None, summarizer=None, distributed=None)¶

Initializes the random agent.

Parameters:	scope (str) – TensorFlow scope (default: name of agent). device – TensorFlow device (default: none) saver – Saver specification, with the following attributes (default: none):

Parameters:	summarizer – Summarizer specification, with the following attributes (default: none):

Parameters:	distributed – Distributed specification, with the following attributes (default: none):

act(states, deterministic=False, independent=False, fetch_tensors=None)¶

Return action(s) for given state(s). States preprocessing and exploration are applied if configured accordingly.

Parameters:	states (any) – One state (usually a value tuple) or dict of states if multiple states are expected. deterministic (bool) – If true, no exploration and sampling is applied. independent (bool) – If true, action is not followed by observe (and hence not included in updates). fetch_tensors (list) – Optional String of named tensors to fetch
Returns:	Scalar value of the action or dict of multiple actions the agent wants to execute. (fetched_tensors) Optional dict() with named tensors fetched

close()¶

from_spec(spec, kwargs)¶: Creates an agent from a specification dict.

initialize_model()¶

last_observation()¶

observe(terminal, reward)¶

Observe experience from the environment to learn from. Optionally pre-processes rewards Child classes should call super to get the processed reward EX: terminal, reward = super()…

Parameters:	terminal (bool) – boolean indicating if the episode terminated after the observation. reward (float) – scalar reward that resulted from executing the action.

reset()¶: Reset the agent to its initial state (e.g. on experiment start). Updates the Model’s internal episode and time step counter, internal states, and resets preprocessors.

restore_model(directory=None, file=None)¶

Restore TensorFlow model. If no checkpoint file is given, the latest checkpoint is restored. If no checkpoint directory is given, the model’s default saver directory is used (unless file specifies the entire path).

Parameters:	directory – Optional checkpoint directory. file – Optional checkpoint file, or path if directory not given.

save_model(directory=None, append_timestep=True)¶

Save TensorFlow model. If no checkpoint directory is given, the model’s default saver directory is used. Optionally appends current timestep to prevent overwriting previous checkpoint files. Turn off to be able to load model from the same given path argument as given here.

Parameters:

directory (str) – Optional checkpoint directory.
append_timestep (bool) – Appends the current timestep to the checkpoint file if true. If this is set to True, the load path must include the checkpoint timestep suffix. For example, if stored to models/ and set to true, the exported file will be of the form models/model.ckpt-X where X is the last timestep saved. The load path must precisely match this file name. If this option is turned off, the checkpoint will always overwrite the file specified in path and the model can always be loaded under this path.

Returns:

Checkpoint path were the model was saved.

set_normalized_actions(actions)¶

set_normalized_states(states)¶

should_stop()¶

tensorforce.agents.trpo_agent module¶

class tensorforce.agents.trpo_agent.TRPOAgent(states, actions, network, batched_observe=True, batching_capacity=1000, scope='trpo', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, update_mode=None, memory=None, discount=0.99, distributions=None, entropy_regularization=None, baseline_mode=None, baseline=None, baseline_optimizer=None, gae_lambda=None, likelihood_ratio_clipping=None, learning_rate=0.001, cg_max_iterations=20, cg_damping=0.001, cg_unroll_loop=False, ls_max_iterations=10, ls_accept_ratio=0.9, ls_unroll_loop=False)¶

Bases: tensorforce.agents.learning_agent.LearningAgent

Trust Region Policy Optimization agent (Schulman et al., 2015).

__init__(states, actions, network, batched_observe=True, batching_capacity=1000, scope='trpo', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, update_mode=None, memory=None, discount=0.99, distributions=None, entropy_regularization=None, baseline_mode=None, baseline=None, baseline_optimizer=None, gae_lambda=None, likelihood_ratio_clipping=None, learning_rate=0.001, cg_max_iterations=20, cg_damping=0.001, cg_unroll_loop=False, ls_max_iterations=10, ls_accept_ratio=0.9, ls_unroll_loop=False)¶

Initializes the TRPO agent.

Parameters:	update_mode – Update mode specification, with the following attributes:

Parameters:

memory (spec) – Memory specification, see core.memories module for more information (default: {type=’latest’, include_next_states=false, capacity=1000*batch_size}).
optimizer (spec) – TRPO agent implicitly defines a optimized-step natural-gradient optimizer.
baseline_mode (str) – One of ‘states’, ‘network’ (default: none).
baseline (spec) – Baseline specification, see core.baselines module for more information (default: none).
baseline_optimizer (spec) – Baseline optimizer specification, see core.optimizers module for more information (default: none).
gae_lambda (float) – Lambda factor for generalized advantage estimation (default: none).
likelihood_ratio_clipping (float) – Likelihood ratio clipping for policy gradient (default: none).
learning_rate (float) – Learning rate of natural-gradient optimizer (default: 1e-3).
cg_max_iterations (int) – Conjugate-gradient max iterations (default: 20).
cg_damping (float) – Conjugate-gradient damping (default: 1e-3).
cg_unroll_loop (bool) – Conjugate-gradient unroll loop (default: false).
ls_max_iterations (int) – Line-search max iterations (default: 10).
ls_accept_ratio (float) – Line-search accept ratio (default: 0.9).
ls_unroll_loop (bool) – Line-search unroll loop (default: false).

act(states, deterministic=False, independent=False, fetch_tensors=None)¶

Return action(s) for given state(s). States preprocessing and exploration are applied if configured accordingly.

Parameters:	states (any) – One state (usually a value tuple) or dict of states if multiple states are expected. deterministic (bool) – If true, no exploration and sampling is applied. independent (bool) – If true, action is not followed by observe (and hence not included in updates). fetch_tensors (list) – Optional String of named tensors to fetch
Returns:	Scalar value of the action or dict of multiple actions the agent wants to execute. (fetched_tensors) Optional dict() with named tensors fetched

close()¶

from_spec(spec, kwargs)¶: Creates an agent from a specification dict.

import_experience(experiences)¶

Imports experiences.

Parameters:	experiences –

initialize_model()¶

last_observation()¶

observe(terminal, reward)¶

Observe experience from the environment to learn from. Optionally pre-processes rewards Child classes should call super to get the processed reward EX: terminal, reward = super()…

Parameters:	terminal (bool) – boolean indicating if the episode terminated after the observation. reward (float) – scalar reward that resulted from executing the action.

reset()¶: Reset the agent to its initial state (e.g. on experiment start). Updates the Model’s internal episode and time step counter, internal states, and resets preprocessors.

restore_model(directory=None, file=None)¶

Restore TensorFlow model. If no checkpoint file is given, the latest checkpoint is restored. If no checkpoint directory is given, the model’s default saver directory is used (unless file specifies the entire path).

Parameters:	directory – Optional checkpoint directory. file – Optional checkpoint file, or path if directory not given.

save_model(directory=None, append_timestep=True)¶

Save TensorFlow model. If no checkpoint directory is given, the model’s default saver directory is used. Optionally appends current timestep to prevent overwriting previous checkpoint files. Turn off to be able to load model from the same given path argument as given here.

Parameters:

directory (str) – Optional checkpoint directory.
append_timestep (bool) – Appends the current timestep to the checkpoint file if true. If this is set to True, the load path must include the checkpoint timestep suffix. For example, if stored to models/ and set to true, the exported file will be of the form models/model.ckpt-X where X is the last timestep saved. The load path must precisely match this file name. If this option is turned off, the checkpoint will always overwrite the file specified in path and the model can always be loaded under this path.

Returns:

Checkpoint path were the model was saved.

set_normalized_actions(actions)¶

set_normalized_states(states)¶

should_stop()¶

tensorforce.agents.vpg_agent module¶

class tensorforce.agents.vpg_agent.VPGAgent(states, actions, network, batched_observe=True, batching_capacity=1000, scope='vpg', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, update_mode=None, memory=None, optimizer=None, discount=0.99, distributions=None, entropy_regularization=None, baseline_mode=None, baseline=None, baseline_optimizer=None, gae_lambda=None)¶

Bases: tensorforce.agents.learning_agent.LearningAgent

Vanilla policy gradient agent (Williams, 1992)).

__init__(states, actions, network, batched_observe=True, batching_capacity=1000, scope='vpg', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, update_mode=None, memory=None, optimizer=None, discount=0.99, distributions=None, entropy_regularization=None, baseline_mode=None, baseline=None, baseline_optimizer=None, gae_lambda=None)¶

Initializes the VPG agent.

Parameters:	update_mode – Update mode specification, with the following attributes:

Parameters:

memory (spec) – Memory specification, see core.memories module for more information (default: {type=’latest’, include_next_states=false, capacity=1000*batch_size}).
optimizer (spec) – Optimizer specification, see core.optimizers module for more information (default: {type=’adam’, learning_rate=1e-3}).
baseline_mode (str) – One of ‘states’, ‘network’ (default: none).
baseline (spec) – Baseline specification, see core.baselines module for more information (default: none).
baseline_optimizer (spec) – Baseline optimizer specification, see core.optimizers module for more information (default: none).
gae_lambda (float) – Lambda factor for generalized advantage estimation (default: none).

act(states, deterministic=False, independent=False, fetch_tensors=None)¶

Return action(s) for given state(s). States preprocessing and exploration are applied if configured accordingly.

Parameters:	states (any) – One state (usually a value tuple) or dict of states if multiple states are expected. deterministic (bool) – If true, no exploration and sampling is applied. independent (bool) – If true, action is not followed by observe (and hence not included in updates). fetch_tensors (list) – Optional String of named tensors to fetch
Returns:	Scalar value of the action or dict of multiple actions the agent wants to execute. (fetched_tensors) Optional dict() with named tensors fetched

close()¶

from_spec(spec, kwargs)¶: Creates an agent from a specification dict.

import_experience(experiences)¶

Imports experiences.

Parameters:	experiences –

initialize_model()¶

last_observation()¶

observe(terminal, reward)¶

Observe experience from the environment to learn from. Optionally pre-processes rewards Child classes should call super to get the processed reward EX: terminal, reward = super()…

Parameters:	terminal (bool) – boolean indicating if the episode terminated after the observation. reward (float) – scalar reward that resulted from executing the action.

reset()¶: Reset the agent to its initial state (e.g. on experiment start). Updates the Model’s internal episode and time step counter, internal states, and resets preprocessors.

restore_model(directory=None, file=None)¶

Restore TensorFlow model. If no checkpoint file is given, the latest checkpoint is restored. If no checkpoint directory is given, the model’s default saver directory is used (unless file specifies the entire path).

Parameters:	directory – Optional checkpoint directory. file – Optional checkpoint file, or path if directory not given.

save_model(directory=None, append_timestep=True)¶

Save TensorFlow model. If no checkpoint directory is given, the model’s default saver directory is used. Optionally appends current timestep to prevent overwriting previous checkpoint files. Turn off to be able to load model from the same given path argument as given here.

Parameters:

directory (str) – Optional checkpoint directory.
append_timestep (bool) – Appends the current timestep to the checkpoint file if true. If this is set to True, the load path must include the checkpoint timestep suffix. For example, if stored to models/ and set to true, the exported file will be of the form models/model.ckpt-X where X is the last timestep saved. The load path must precisely match this file name. If this option is turned off, the checkpoint will always overwrite the file specified in path and the model can always be loaded under this path.

Returns:

Checkpoint path were the model was saved.

set_normalized_actions(actions)¶

set_normalized_states(states)¶

should_stop()¶

Module contents¶

class tensorforce.agents.Agent(states, actions, batched_observe=True, batching_capacity=1000)¶

Bases: object

Base class for TensorForce agents.

__init__(states, actions, batched_observe=True, batching_capacity=1000)¶

Initializes the agent.

Parameters:	states – States specification, with the following attributes (required):

Parameters:	actions – Actions specification, with the following attributes (required):

Parameters:	batched_observe (bool) – Specifies whether calls to model.observe() are batched, for improved performance (default: true). batching_capacity (int) – Batching capacity of agent and model (default: 1000).

act(states, deterministic=False, independent=False, fetch_tensors=None)¶

Return action(s) for given state(s). States preprocessing and exploration are applied if configured accordingly.

Parameters:	states (any) – One state (usually a value tuple) or dict of states if multiple states are expected. deterministic (bool) – If true, no exploration and sampling is applied. independent (bool) – If true, action is not followed by observe (and hence not included in updates). fetch_tensors (list) – Optional String of named tensors to fetch
Returns:	Scalar value of the action or dict of multiple actions the agent wants to execute. (fetched_tensors) Optional dict() with named tensors fetched

close()¶

static from_spec(spec, kwargs)¶: Creates an agent from a specification dict.

initialize_model()¶: Creates the model for the respective agent based on specifications given by user. This is a separate call after constructing the agent because the agent constructor has to perform a number of checks on the specs first, sometimes adjusting them e.g. by converting to a dict.

last_observation()¶

observe(terminal, reward)¶

Observe experience from the environment to learn from. Optionally pre-processes rewards Child classes should call super to get the processed reward EX: terminal, reward = super()…

Parameters:	terminal (bool) – boolean indicating if the episode terminated after the observation. reward (float) – scalar reward that resulted from executing the action.

reset()¶: Reset the agent to its initial state (e.g. on experiment start). Updates the Model’s internal episode and time step counter, internal states, and resets preprocessors.

restore_model(directory=None, file=None)¶

Restore TensorFlow model. If no checkpoint file is given, the latest checkpoint is restored. If no checkpoint directory is given, the model’s default saver directory is used (unless file specifies the entire path).

Parameters:	directory – Optional checkpoint directory. file – Optional checkpoint file, or path if directory not given.

save_model(directory=None, append_timestep=True)¶

Save TensorFlow model. If no checkpoint directory is given, the model’s default saver directory is used. Optionally appends current timestep to prevent overwriting previous checkpoint files. Turn off to be able to load model from the same given path argument as given here.

Parameters:

directory (str) – Optional checkpoint directory.
append_timestep (bool) – Appends the current timestep to the checkpoint file if true. If this is set to True, the load path must include the checkpoint timestep suffix. For example, if stored to models/ and set to true, the exported file will be of the form models/model.ckpt-X where X is the last timestep saved. The load path must precisely match this file name. If this option is turned off, the checkpoint will always overwrite the file specified in path and the model can always be loaded under this path.

Returns:

Checkpoint path were the model was saved.

set_normalized_actions(actions)¶

set_normalized_states(states)¶

should_stop()¶

class tensorforce.agents.ConstantAgent(states, actions, action_values, batched_observe=True, batching_capacity=1000, scope='constant', device=None, saver=None, summarizer=None, distributed=None)¶

Bases: tensorforce.agents.agent.Agent

Agent returning constant action values.

__init__(states, actions, action_values, batched_observe=True, batching_capacity=1000, scope='constant', device=None, saver=None, summarizer=None, distributed=None)¶

Initializes the constant agent.

Parameters:	action_values (value, or dict of values) – Action values returned by the agent (required). scope (str) – TensorFlow scope (default: name of agent). device – TensorFlow device (default: none) saver – Saver specification, with the following attributes (default: none):

Parameters:	summarizer – Summarizer specification, with the following attributes (default: none):

Parameters:	distributed – Distributed specification, with the following attributes (default: none):

act(states, deterministic=False, independent=False, fetch_tensors=None)¶

Return action(s) for given state(s). States preprocessing and exploration are applied if configured accordingly.

Parameters:	states (any) – One state (usually a value tuple) or dict of states if multiple states are expected. deterministic (bool) – If true, no exploration and sampling is applied. independent (bool) – If true, action is not followed by observe (and hence not included in updates). fetch_tensors (list) – Optional String of named tensors to fetch
Returns:	Scalar value of the action or dict of multiple actions the agent wants to execute. (fetched_tensors) Optional dict() with named tensors fetched

close()¶

from_spec(spec, kwargs)¶: Creates an agent from a specification dict.

initialize_model()¶

last_observation()¶

observe(terminal, reward)¶

Observe experience from the environment to learn from. Optionally pre-processes rewards Child classes should call super to get the processed reward EX: terminal, reward = super()…

Parameters:	terminal (bool) – boolean indicating if the episode terminated after the observation. reward (float) – scalar reward that resulted from executing the action.

reset()¶: Reset the agent to its initial state (e.g. on experiment start). Updates the Model’s internal episode and time step counter, internal states, and resets preprocessors.

restore_model(directory=None, file=None)¶

Restore TensorFlow model. If no checkpoint file is given, the latest checkpoint is restored. If no checkpoint directory is given, the model’s default saver directory is used (unless file specifies the entire path).

Parameters:	directory – Optional checkpoint directory. file – Optional checkpoint file, or path if directory not given.

save_model(directory=None, append_timestep=True)¶

Save TensorFlow model. If no checkpoint directory is given, the model’s default saver directory is used. Optionally appends current timestep to prevent overwriting previous checkpoint files. Turn off to be able to load model from the same given path argument as given here.

Parameters:

directory (str) – Optional checkpoint directory.
append_timestep (bool) – Appends the current timestep to the checkpoint file if true. If this is set to True, the load path must include the checkpoint timestep suffix. For example, if stored to models/ and set to true, the exported file will be of the form models/model.ckpt-X where X is the last timestep saved. The load path must precisely match this file name. If this option is turned off, the checkpoint will always overwrite the file specified in path and the model can always be loaded under this path.

Returns:

Checkpoint path were the model was saved.

set_normalized_actions(actions)¶

set_normalized_states(states)¶

should_stop()¶

class tensorforce.agents.RandomAgent(states, actions, batched_observe=True, batching_capacity=1000, scope='random', device=None, saver=None, summarizer=None, distributed=None)¶

Bases: tensorforce.agents.agent.Agent

Agent returning random action values.

__init__(states, actions, batched_observe=True, batching_capacity=1000, scope='random', device=None, saver=None, summarizer=None, distributed=None)¶

Initializes the random agent.

Parameters:	scope (str) – TensorFlow scope (default: name of agent). device – TensorFlow device (default: none) saver – Saver specification, with the following attributes (default: none):

Parameters:	summarizer – Summarizer specification, with the following attributes (default: none):

Parameters:	distributed – Distributed specification, with the following attributes (default: none):

act(states, deterministic=False, independent=False, fetch_tensors=None)¶

Return action(s) for given state(s). States preprocessing and exploration are applied if configured accordingly.

Parameters:	states (any) – One state (usually a value tuple) or dict of states if multiple states are expected. deterministic (bool) – If true, no exploration and sampling is applied. independent (bool) – If true, action is not followed by observe (and hence not included in updates). fetch_tensors (list) – Optional String of named tensors to fetch
Returns:	Scalar value of the action or dict of multiple actions the agent wants to execute. (fetched_tensors) Optional dict() with named tensors fetched

close()¶

from_spec(spec, kwargs)¶: Creates an agent from a specification dict.

initialize_model()¶

last_observation()¶

observe(terminal, reward)¶

Observe experience from the environment to learn from. Optionally pre-processes rewards Child classes should call super to get the processed reward EX: terminal, reward = super()…

Parameters:	terminal (bool) – boolean indicating if the episode terminated after the observation. reward (float) – scalar reward that resulted from executing the action.

reset()¶: Reset the agent to its initial state (e.g. on experiment start). Updates the Model’s internal episode and time step counter, internal states, and resets preprocessors.

restore_model(directory=None, file=None)¶

Restore TensorFlow model. If no checkpoint file is given, the latest checkpoint is restored. If no checkpoint directory is given, the model’s default saver directory is used (unless file specifies the entire path).

Parameters:	directory – Optional checkpoint directory. file – Optional checkpoint file, or path if directory not given.

save_model(directory=None, append_timestep=True)¶

Save TensorFlow model. If no checkpoint directory is given, the model’s default saver directory is used. Optionally appends current timestep to prevent overwriting previous checkpoint files. Turn off to be able to load model from the same given path argument as given here.

Parameters:

directory (str) – Optional checkpoint directory.
append_timestep (bool) – Appends the current timestep to the checkpoint file if true. If this is set to True, the load path must include the checkpoint timestep suffix. For example, if stored to models/ and set to true, the exported file will be of the form models/model.ckpt-X where X is the last timestep saved. The load path must precisely match this file name. If this option is turned off, the checkpoint will always overwrite the file specified in path and the model can always be loaded under this path.

Returns:

Checkpoint path were the model was saved.

set_normalized_actions(actions)¶

set_normalized_states(states)¶

should_stop()¶

class tensorforce.agents.LearningAgent(states, actions, network, update_mode, memory, optimizer, batched_observe=True, batching_capacity=1000, scope='learning-agent', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, discount=0.99, distributions=None, entropy_regularization=None)¶

Bases: tensorforce.agents.agent.Agent

Base class for learning agents, using as model a subclass of MemoryModel and DistributionModel.

__init__(states, actions, network, update_mode, memory, optimizer, batched_observe=True, batching_capacity=1000, scope='learning-agent', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, discount=0.99, distributions=None, entropy_regularization=None)¶

Initializes the learning agent.

Parameters:	update_mode – Update mode specification, with the following attributes (required):

Parameters:

memory (spec) – Memory specification, see core.memories module for more information (required).
optimizer (spec) – Optimizer specification, see core.optimizers module for more information (required).
network (spec) – Network specification, usually a list of layer specifications, see core.networks module for more information (required).
scope (str) – TensorFlow scope (default: name of agent).
device – TensorFlow device (default: none)
saver – Saver specification, with the following attributes (default: none):

Parameters:	summarizer – Summarizer specification, with the following attributes (default: none):

Parameters:	execution – Distributed specification, with the following attributes (default: none):

Parameters:

variable_noise (float) – Standard deviation of variable noise (default: none).
states_preprocessing (spec, or dict of specs) – States preprocessing specification, see core.preprocessors module for more information (default: none)
actions_exploration (spec, or dict of specs) – Actions exploration specification, see core.explorations module for more information (default: none).
reward_preprocessing (spec) – Reward preprocessing specification, see core.preprocessors module for more information (default: none).
discount (float) – Discount factor for future rewards (default: 0.99).
distributions (spec / dict of specs) – Distributions specifications, see core.distributions module for more information (default: none).
entropy_regularization (float) – Entropy regularization weight (default: none).

act(states, deterministic=False, independent=False, fetch_tensors=None)¶

Return action(s) for given state(s). States preprocessing and exploration are applied if configured accordingly.

Parameters:	states (any) – One state (usually a value tuple) or dict of states if multiple states are expected. deterministic (bool) – If true, no exploration and sampling is applied. independent (bool) – If true, action is not followed by observe (and hence not included in updates). fetch_tensors (list) – Optional String of named tensors to fetch
Returns:	Scalar value of the action or dict of multiple actions the agent wants to execute. (fetched_tensors) Optional dict() with named tensors fetched

close()¶

from_spec(spec, kwargs)¶: Creates an agent from a specification dict.

import_experience(experiences)¶

Imports experiences.

Parameters:	experiences –

initialize_model()¶: Creates the model for the respective agent based on specifications given by user. This is a separate call after constructing the agent because the agent constructor has to perform a number of checks on the specs first, sometimes adjusting them e.g. by converting to a dict.

last_observation()¶

observe(terminal, reward)¶

Observe experience from the environment to learn from. Optionally pre-processes rewards Child classes should call super to get the processed reward EX: terminal, reward = super()…

Parameters:	terminal (bool) – boolean indicating if the episode terminated after the observation. reward (float) – scalar reward that resulted from executing the action.

reset()¶: Reset the agent to its initial state (e.g. on experiment start). Updates the Model’s internal episode and time step counter, internal states, and resets preprocessors.

restore_model(directory=None, file=None)¶

Restore TensorFlow model. If no checkpoint file is given, the latest checkpoint is restored. If no checkpoint directory is given, the model’s default saver directory is used (unless file specifies the entire path).

Parameters:	directory – Optional checkpoint directory. file – Optional checkpoint file, or path if directory not given.

save_model(directory=None, append_timestep=True)¶

Save TensorFlow model. If no checkpoint directory is given, the model’s default saver directory is used. Optionally appends current timestep to prevent overwriting previous checkpoint files. Turn off to be able to load model from the same given path argument as given here.

Parameters:

directory (str) – Optional checkpoint directory.
append_timestep (bool) – Appends the current timestep to the checkpoint file if true. If this is set to True, the load path must include the checkpoint timestep suffix. For example, if stored to models/ and set to true, the exported file will be of the form models/model.ckpt-X where X is the last timestep saved. The load path must precisely match this file name. If this option is turned off, the checkpoint will always overwrite the file specified in path and the model can always be loaded under this path.

Returns:

Checkpoint path were the model was saved.

set_normalized_actions(actions)¶

set_normalized_states(states)¶

should_stop()¶

class tensorforce.agents.DQFDAgent(states, actions, network, batched_observe=True, batching_capacity=1000, scope='dqfd', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, update_mode=None, memory=None, optimizer=None, discount=0.99, distributions=None, entropy_regularization=None, target_sync_frequency=10000, target_update_weight=1.0, huber_loss=None, expert_margin=0.5, supervised_weight=0.1, demo_memory_capacity=10000, demo_sampling_ratio=0.2)¶

Bases: tensorforce.agents.learning_agent.LearningAgent

Deep Q-learning from demonstration agent (Hester et al., 2017).

__init__(states, actions, network, batched_observe=True, batching_capacity=1000, scope='dqfd', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, update_mode=None, memory=None, optimizer=None, discount=0.99, distributions=None, entropy_regularization=None, target_sync_frequency=10000, target_update_weight=1.0, huber_loss=None, expert_margin=0.5, supervised_weight=0.1, demo_memory_capacity=10000, demo_sampling_ratio=0.2)¶

Initializes the DQFD agent.

Parameters:	update_mode – Update mode specification, with the following attributes:

Parameters:

memory (spec) – Memory specification, see core.memories module for more information (default: {type=’replay’, include_next_states=true, capacity=1000*batch_size}).
optimizer (spec) – Optimizer specification, see core.optimizers module for more information (default: {type=’adam’, learning_rate=1e-3}).
target_sync_frequency (int) – Target network sync frequency (default: 10000).
target_update_weight (float) – Target network update weight (default: 1.0).
huber_loss (float) – Huber loss clipping (default: none).
expert_margin (float) – Enforced supervised margin between expert action Q-value and other Q-values (default: 0.5).
supervised_weight (float) – Weight of supervised loss term (default: 0.1).
demo_memory_capacity (int) – Capacity of expert demonstration memory (default: 10000).
demo_sampling_ratio (float) – Runtime sampling ratio of expert data (default: 0.2).

act(states, deterministic=False, independent=False, fetch_tensors=None)¶

Return action(s) for given state(s). States preprocessing and exploration are applied if configured accordingly.

Parameters:	states (any) – One state (usually a value tuple) or dict of states if multiple states are expected. deterministic (bool) – If true, no exploration and sampling is applied. independent (bool) – If true, action is not followed by observe (and hence not included in updates). fetch_tensors (list) – Optional String of named tensors to fetch
Returns:	Scalar value of the action or dict of multiple actions the agent wants to execute. (fetched_tensors) Optional dict() with named tensors fetched

close()¶

from_spec(spec, kwargs)¶: Creates an agent from a specification dict.

import_demonstrations(demonstrations)¶

Imports demonstrations, i.e. expert observations. Note that for large numbers of observations, set_demonstrations is more appropriate, which directly sets memory contents to an array an expects a different layout.

Parameters:	demonstrations – List of observation dicts

import_experience(experiences)¶

Imports experiences.

Parameters:	experiences –

initialize_model()¶

last_observation()¶

observe(terminal, reward)¶

Observe experience from the environment to learn from. Optionally pre-processes rewards Child classes should call super to get the processed reward EX: terminal, reward = super()…

Parameters:	terminal (bool) – boolean indicating if the episode terminated after the observation. reward (float) – scalar reward that resulted from executing the action.

pretrain(steps)¶

Computes pre-train updates.

Parameters:	steps – Number of updates to execute.

reset()¶: Reset the agent to its initial state (e.g. on experiment start). Updates the Model’s internal episode and time step counter, internal states, and resets preprocessors.

restore_model(directory=None, file=None)¶

Restore TensorFlow model. If no checkpoint file is given, the latest checkpoint is restored. If no checkpoint directory is given, the model’s default saver directory is used (unless file specifies the entire path).

Parameters:	directory – Optional checkpoint directory. file – Optional checkpoint file, or path if directory not given.

save_model(directory=None, append_timestep=True)¶

Save TensorFlow model. If no checkpoint directory is given, the model’s default saver directory is used. Optionally appends current timestep to prevent overwriting previous checkpoint files. Turn off to be able to load model from the same given path argument as given here.

Parameters:

directory (str) – Optional checkpoint directory.
append_timestep (bool) – Appends the current timestep to the checkpoint file if true. If this is set to True, the load path must include the checkpoint timestep suffix. For example, if stored to models/ and set to true, the exported file will be of the form models/model.ckpt-X where X is the last timestep saved. The load path must precisely match this file name. If this option is turned off, the checkpoint will always overwrite the file specified in path and the model can always be loaded under this path.

Returns:

Checkpoint path were the model was saved.

set_normalized_actions(actions)¶

set_normalized_states(states)¶

should_stop()¶

class tensorforce.agents.DQNAgent(states, actions, network, batched_observe=True, batching_capacity=1000, scope='dqn', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, update_mode=None, memory=None, optimizer=None, discount=0.99, distributions=None, entropy_regularization=None, target_sync_frequency=10000, target_update_weight=1.0, double_q_model=False, huber_loss=None)¶

Bases: tensorforce.agents.learning_agent.LearningAgent

Deep Q-Network agent (Mnih et al., 2015).

__init__(states, actions, network, batched_observe=True, batching_capacity=1000, scope='dqn', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, update_mode=None, memory=None, optimizer=None, discount=0.99, distributions=None, entropy_regularization=None, target_sync_frequency=10000, target_update_weight=1.0, double_q_model=False, huber_loss=None)¶

Initializes the DQN agent.

Parameters:	update_mode – Update mode specification, with the following attributes:

Parameters:

memory (spec) – Memory specification, see core.memories module for more information (default: {type=’replay’, include_next_states=true, capacity=1000*batch_size}).
optimizer (spec) – Optimizer specification, see core.optimizers module for more information (default: {type=’adam’, learning_rate=1e-3}).
target_sync_frequency (int) – Target network sync frequency (default: 10000).
target_update_weight (float) – Target network update weight (default: 1.0).
double_q_model (bool) – Specifies whether double DQN mode is used (default: false).
huber_loss (float) – Huber loss clipping (default: none).

act(states, deterministic=False, independent=False, fetch_tensors=None)¶

Return action(s) for given state(s). States preprocessing and exploration are applied if configured accordingly.

Parameters:	states (any) – One state (usually a value tuple) or dict of states if multiple states are expected. deterministic (bool) – If true, no exploration and sampling is applied. independent (bool) – If true, action is not followed by observe (and hence not included in updates). fetch_tensors (list) – Optional String of named tensors to fetch
Returns:	Scalar value of the action or dict of multiple actions the agent wants to execute. (fetched_tensors) Optional dict() with named tensors fetched

close()¶

from_spec(spec, kwargs)¶: Creates an agent from a specification dict.

import_experience(experiences)¶

Imports experiences.

Parameters:	experiences –

initialize_model()¶

last_observation()¶

observe(terminal, reward)¶

Observe experience from the environment to learn from. Optionally pre-processes rewards Child classes should call super to get the processed reward EX: terminal, reward = super()…

Parameters:	terminal (bool) – boolean indicating if the episode terminated after the observation. reward (float) – scalar reward that resulted from executing the action.

reset()¶: Reset the agent to its initial state (e.g. on experiment start). Updates the Model’s internal episode and time step counter, internal states, and resets preprocessors.

restore_model(directory=None, file=None)¶

Restore TensorFlow model. If no checkpoint file is given, the latest checkpoint is restored. If no checkpoint directory is given, the model’s default saver directory is used (unless file specifies the entire path).

Parameters:	directory – Optional checkpoint directory. file – Optional checkpoint file, or path if directory not given.

save_model(directory=None, append_timestep=True)¶

Save TensorFlow model. If no checkpoint directory is given, the model’s default saver directory is used. Optionally appends current timestep to prevent overwriting previous checkpoint files. Turn off to be able to load model from the same given path argument as given here.

Parameters:

directory (str) – Optional checkpoint directory.
append_timestep (bool) – Appends the current timestep to the checkpoint file if true. If this is set to True, the load path must include the checkpoint timestep suffix. For example, if stored to models/ and set to true, the exported file will be of the form models/model.ckpt-X where X is the last timestep saved. The load path must precisely match this file name. If this option is turned off, the checkpoint will always overwrite the file specified in path and the model can always be loaded under this path.

Returns:

Checkpoint path were the model was saved.

set_normalized_actions(actions)¶

set_normalized_states(states)¶

should_stop()¶

class tensorforce.agents.DQNNstepAgent(states, actions, network, batched_observe=True, batching_capacity=1000, scope='dqn-nstep', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, update_mode=None, memory=None, optimizer=None, discount=0.99, distributions=None, entropy_regularization=None, target_sync_frequency=10000, target_update_weight=1.0, double_q_model=False, huber_loss=None)¶

Bases: tensorforce.agents.learning_agent.LearningAgent

DQN n-step agent.

__init__(states, actions, network, batched_observe=True, batching_capacity=1000, scope='dqn-nstep', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, update_mode=None, memory=None, optimizer=None, discount=0.99, distributions=None, entropy_regularization=None, target_sync_frequency=10000, target_update_weight=1.0, double_q_model=False, huber_loss=None)¶

Initializes the DQN n-step agent.

Parameters:	update_mode – Update mode specification, with the following attributes:

Parameters:

memory (spec) – Memory specification, see core.memories module for more information (default: {type=’latest’, include_next_states=true, capacity=1000*batch_size}).
optimizer (spec) – Optimizer specification, see core.optimizers module for more information (default: {type=’adam’, learning_rate=1e-3}).
target_sync_frequency (int) – Target network sync frequency (default: 10000).
target_update_weight (float) – Target network update weight (default: 1.0).
double_q_model (bool) – Specifies whether double DQN mode is used (default: false).
huber_loss (float) – Huber loss clipping (default: none).

act(states, deterministic=False, independent=False, fetch_tensors=None)¶

Return action(s) for given state(s). States preprocessing and exploration are applied if configured accordingly.

Parameters:	states (any) – One state (usually a value tuple) or dict of states if multiple states are expected. deterministic (bool) – If true, no exploration and sampling is applied. independent (bool) – If true, action is not followed by observe (and hence not included in updates). fetch_tensors (list) – Optional String of named tensors to fetch
Returns:	Scalar value of the action or dict of multiple actions the agent wants to execute. (fetched_tensors) Optional dict() with named tensors fetched

close()¶

from_spec(spec, kwargs)¶: Creates an agent from a specification dict.

import_experience(experiences)¶

Imports experiences.

Parameters:	experiences –

initialize_model()¶

last_observation()¶

observe(terminal, reward)¶

Observe experience from the environment to learn from. Optionally pre-processes rewards Child classes should call super to get the processed reward EX: terminal, reward = super()…

Parameters:	terminal (bool) – boolean indicating if the episode terminated after the observation. reward (float) – scalar reward that resulted from executing the action.

reset()¶: Reset the agent to its initial state (e.g. on experiment start). Updates the Model’s internal episode and time step counter, internal states, and resets preprocessors.

restore_model(directory=None, file=None)¶

Restore TensorFlow model. If no checkpoint file is given, the latest checkpoint is restored. If no checkpoint directory is given, the model’s default saver directory is used (unless file specifies the entire path).

Parameters:	directory – Optional checkpoint directory. file – Optional checkpoint file, or path if directory not given.

save_model(directory=None, append_timestep=True)¶

Save TensorFlow model. If no checkpoint directory is given, the model’s default saver directory is used. Optionally appends current timestep to prevent overwriting previous checkpoint files. Turn off to be able to load model from the same given path argument as given here.

Parameters:

directory (str) – Optional checkpoint directory.
append_timestep (bool) – Appends the current timestep to the checkpoint file if true. If this is set to True, the load path must include the checkpoint timestep suffix. For example, if stored to models/ and set to true, the exported file will be of the form models/model.ckpt-X where X is the last timestep saved. The load path must precisely match this file name. If this option is turned off, the checkpoint will always overwrite the file specified in path and the model can always be loaded under this path.

Returns:

Checkpoint path were the model was saved.

set_normalized_actions(actions)¶

set_normalized_states(states)¶

should_stop()¶

class tensorforce.agents.NAFAgent(states, actions, network, batched_observe=True, batching_capacity=1000, scope='naf', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, update_mode=None, memory=None, optimizer=None, discount=0.99, distributions=None, entropy_regularization=None, target_sync_frequency=10000, target_update_weight=1.0, double_q_model=False, huber_loss=None)¶

Bases: tensorforce.agents.learning_agent.LearningAgent

Normalized Advantage Function agent (Gu et al., 2016).

__init__(states, actions, network, batched_observe=True, batching_capacity=1000, scope='naf', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, update_mode=None, memory=None, optimizer=None, discount=0.99, distributions=None, entropy_regularization=None, target_sync_frequency=10000, target_update_weight=1.0, double_q_model=False, huber_loss=None)¶

Initializes the NAF agent.

Parameters:	update_mode – Update mode specification, with the following attributes:

Parameters:

memory (spec) – Memory specification, see core.memories module for more information (default: {type=’replay’, include_next_states=true, capacity=1000*batch_size}).
optimizer (spec) – Optimizer specification, see core.optimizers module for more information (default: {type=’adam’, learning_rate=1e-3}).
target_sync_frequency (int) – Target network sync frequency (default: 10000).
target_update_weight (float) – Target network update weight (default: 1.0).
double_q_model (bool) – Specifies whether double DQN mode is used (default: false).
huber_loss (float) – Huber loss clipping (default: none).

act(states, deterministic=False, independent=False, fetch_tensors=None)¶

Return action(s) for given state(s). States preprocessing and exploration are applied if configured accordingly.

Parameters:	states (any) – One state (usually a value tuple) or dict of states if multiple states are expected. deterministic (bool) – If true, no exploration and sampling is applied. independent (bool) – If true, action is not followed by observe (and hence not included in updates). fetch_tensors (list) – Optional String of named tensors to fetch
Returns:	Scalar value of the action or dict of multiple actions the agent wants to execute. (fetched_tensors) Optional dict() with named tensors fetched

close()¶

from_spec(spec, kwargs)¶: Creates an agent from a specification dict.

import_experience(experiences)¶

Imports experiences.

Parameters:	experiences –

initialize_model()¶

last_observation()¶

observe(terminal, reward)¶

Observe experience from the environment to learn from. Optionally pre-processes rewards Child classes should call super to get the processed reward EX: terminal, reward = super()…

Parameters:	terminal (bool) – boolean indicating if the episode terminated after the observation. reward (float) – scalar reward that resulted from executing the action.

reset()¶: Reset the agent to its initial state (e.g. on experiment start). Updates the Model’s internal episode and time step counter, internal states, and resets preprocessors.

restore_model(directory=None, file=None)¶

Restore TensorFlow model. If no checkpoint file is given, the latest checkpoint is restored. If no checkpoint directory is given, the model’s default saver directory is used (unless file specifies the entire path).

Parameters:	directory – Optional checkpoint directory. file – Optional checkpoint file, or path if directory not given.

save_model(directory=None, append_timestep=True)¶

Save TensorFlow model. If no checkpoint directory is given, the model’s default saver directory is used. Optionally appends current timestep to prevent overwriting previous checkpoint files. Turn off to be able to load model from the same given path argument as given here.

Parameters:

directory (str) – Optional checkpoint directory.
append_timestep (bool) – Appends the current timestep to the checkpoint file if true. If this is set to True, the load path must include the checkpoint timestep suffix. For example, if stored to models/ and set to true, the exported file will be of the form models/model.ckpt-X where X is the last timestep saved. The load path must precisely match this file name. If this option is turned off, the checkpoint will always overwrite the file specified in path and the model can always be loaded under this path.

Returns:

Checkpoint path were the model was saved.

set_normalized_actions(actions)¶

set_normalized_states(states)¶

should_stop()¶

class tensorforce.agents.PPOAgent(states, actions, network, batched_observe=True, batching_capacity=1000, scope='ppo', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, update_mode=None, memory=None, discount=0.99, distributions=None, entropy_regularization=None, baseline_mode=None, baseline=None, baseline_optimizer=None, gae_lambda=None, likelihood_ratio_clipping=0.2, step_optimizer=None, subsampling_fraction=0.1, optimization_steps=50)¶

Bases: tensorforce.agents.learning_agent.LearningAgent

Proximal Policy Optimization agent (Schulman et al., 2017).

__init__(states, actions, network, batched_observe=True, batching_capacity=1000, scope='ppo', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, update_mode=None, memory=None, discount=0.99, distributions=None, entropy_regularization=None, baseline_mode=None, baseline=None, baseline_optimizer=None, gae_lambda=None, likelihood_ratio_clipping=0.2, step_optimizer=None, subsampling_fraction=0.1, optimization_steps=50)¶

Initializes the PPO agent.

Parameters:	update_mode – Update mode specification, with the following attributes:

Parameters:

memory (spec) – Memory specification, see core.memories module for more information (default: {type=’latest’, include_next_states=false, capacity=1000*batch_size}).
optimizer (spec) – PPO agent implicitly defines a multi-step subsampling optimizer.
baseline_mode (str) – One of ‘states’, ‘network’ (default: none).
baseline (spec) – Baseline specification, see core.baselines module for more information (default: none).
baseline_optimizer (spec) – Baseline optimizer specification, see core.optimizers module for more information (default: none).
gae_lambda (float) – Lambda factor for generalized advantage estimation (default: none).
likelihood_ratio_clipping (float) – Likelihood ratio clipping for policy gradient (default: 0.2).
step_optimizer (spec) – Step optimizer specification of implicit multi-step subsampling optimizer, see core.optimizers module for more information (default: {type=’adam’, learning_rate=1e-3}).
subsampling_fraction (float) – Subsampling fraction of implicit subsampling optimizer (default: 0.1).
optimization_steps (int) – Number of optimization steps for implicit multi-step optimizer (default: 50).

act(states, deterministic=False, independent=False, fetch_tensors=None)¶

Return action(s) for given state(s). States preprocessing and exploration are applied if configured accordingly.

Parameters:	states (any) – One state (usually a value tuple) or dict of states if multiple states are expected. deterministic (bool) – If true, no exploration and sampling is applied. independent (bool) – If true, action is not followed by observe (and hence not included in updates). fetch_tensors (list) – Optional String of named tensors to fetch
Returns:	Scalar value of the action or dict of multiple actions the agent wants to execute. (fetched_tensors) Optional dict() with named tensors fetched

close()¶

from_spec(spec, kwargs)¶: Creates an agent from a specification dict.

import_experience(experiences)¶

Imports experiences.

Parameters:	experiences –

initialize_model()¶

last_observation()¶

observe(terminal, reward)¶

Observe experience from the environment to learn from. Optionally pre-processes rewards Child classes should call super to get the processed reward EX: terminal, reward = super()…

Parameters:	terminal (bool) – boolean indicating if the episode terminated after the observation. reward (float) – scalar reward that resulted from executing the action.

reset()¶: Reset the agent to its initial state (e.g. on experiment start). Updates the Model’s internal episode and time step counter, internal states, and resets preprocessors.

restore_model(directory=None, file=None)¶

Restore TensorFlow model. If no checkpoint file is given, the latest checkpoint is restored. If no checkpoint directory is given, the model’s default saver directory is used (unless file specifies the entire path).

Parameters:	directory – Optional checkpoint directory. file – Optional checkpoint file, or path if directory not given.

save_model(directory=None, append_timestep=True)¶

Save TensorFlow model. If no checkpoint directory is given, the model’s default saver directory is used. Optionally appends current timestep to prevent overwriting previous checkpoint files. Turn off to be able to load model from the same given path argument as given here.

Parameters:

directory (str) – Optional checkpoint directory.
append_timestep (bool) – Appends the current timestep to the checkpoint file if true. If this is set to True, the load path must include the checkpoint timestep suffix. For example, if stored to models/ and set to true, the exported file will be of the form models/model.ckpt-X where X is the last timestep saved. The load path must precisely match this file name. If this option is turned off, the checkpoint will always overwrite the file specified in path and the model can always be loaded under this path.

Returns:

Checkpoint path were the model was saved.

set_normalized_actions(actions)¶

set_normalized_states(states)¶

should_stop()¶

class tensorforce.agents.TRPOAgent(states, actions, network, batched_observe=True, batching_capacity=1000, scope='trpo', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, update_mode=None, memory=None, discount=0.99, distributions=None, entropy_regularization=None, baseline_mode=None, baseline=None, baseline_optimizer=None, gae_lambda=None, likelihood_ratio_clipping=None, learning_rate=0.001, cg_max_iterations=20, cg_damping=0.001, cg_unroll_loop=False, ls_max_iterations=10, ls_accept_ratio=0.9, ls_unroll_loop=False)¶

Bases: tensorforce.agents.learning_agent.LearningAgent

Trust Region Policy Optimization agent (Schulman et al., 2015).

__init__(states, actions, network, batched_observe=True, batching_capacity=1000, scope='trpo', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, update_mode=None, memory=None, discount=0.99, distributions=None, entropy_regularization=None, baseline_mode=None, baseline=None, baseline_optimizer=None, gae_lambda=None, likelihood_ratio_clipping=None, learning_rate=0.001, cg_max_iterations=20, cg_damping=0.001, cg_unroll_loop=False, ls_max_iterations=10, ls_accept_ratio=0.9, ls_unroll_loop=False)¶

Initializes the TRPO agent.

Parameters:	update_mode – Update mode specification, with the following attributes:

Parameters:

memory (spec) – Memory specification, see core.memories module for more information (default: {type=’latest’, include_next_states=false, capacity=1000*batch_size}).
optimizer (spec) – TRPO agent implicitly defines a optimized-step natural-gradient optimizer.
baseline_mode (str) – One of ‘states’, ‘network’ (default: none).
baseline (spec) – Baseline specification, see core.baselines module for more information (default: none).
baseline_optimizer (spec) – Baseline optimizer specification, see core.optimizers module for more information (default: none).
gae_lambda (float) – Lambda factor for generalized advantage estimation (default: none).
likelihood_ratio_clipping (float) – Likelihood ratio clipping for policy gradient (default: none).
learning_rate (float) – Learning rate of natural-gradient optimizer (default: 1e-3).
cg_max_iterations (int) – Conjugate-gradient max iterations (default: 20).
cg_damping (float) – Conjugate-gradient damping (default: 1e-3).
cg_unroll_loop (bool) – Conjugate-gradient unroll loop (default: false).
ls_max_iterations (int) – Line-search max iterations (default: 10).
ls_accept_ratio (float) – Line-search accept ratio (default: 0.9).
ls_unroll_loop (bool) – Line-search unroll loop (default: false).

act(states, deterministic=False, independent=False, fetch_tensors=None)¶

Return action(s) for given state(s). States preprocessing and exploration are applied if configured accordingly.

Parameters:	states (any) – One state (usually a value tuple) or dict of states if multiple states are expected. deterministic (bool) – If true, no exploration and sampling is applied. independent (bool) – If true, action is not followed by observe (and hence not included in updates). fetch_tensors (list) – Optional String of named tensors to fetch
Returns:	Scalar value of the action or dict of multiple actions the agent wants to execute. (fetched_tensors) Optional dict() with named tensors fetched

close()¶

from_spec(spec, kwargs)¶: Creates an agent from a specification dict.

import_experience(experiences)¶

Imports experiences.

Parameters:	experiences –

initialize_model()¶

last_observation()¶

observe(terminal, reward)¶

Observe experience from the environment to learn from. Optionally pre-processes rewards Child classes should call super to get the processed reward EX: terminal, reward = super()…

Parameters:	terminal (bool) – boolean indicating if the episode terminated after the observation. reward (float) – scalar reward that resulted from executing the action.

reset()¶: Reset the agent to its initial state (e.g. on experiment start). Updates the Model’s internal episode and time step counter, internal states, and resets preprocessors.

restore_model(directory=None, file=None)¶

Restore TensorFlow model. If no checkpoint file is given, the latest checkpoint is restored. If no checkpoint directory is given, the model’s default saver directory is used (unless file specifies the entire path).

Parameters:	directory – Optional checkpoint directory. file – Optional checkpoint file, or path if directory not given.

save_model(directory=None, append_timestep=True)¶

Save TensorFlow model. If no checkpoint directory is given, the model’s default saver directory is used. Optionally appends current timestep to prevent overwriting previous checkpoint files. Turn off to be able to load model from the same given path argument as given here.

Parameters:

directory (str) – Optional checkpoint directory.
append_timestep (bool) – Appends the current timestep to the checkpoint file if true. If this is set to True, the load path must include the checkpoint timestep suffix. For example, if stored to models/ and set to true, the exported file will be of the form models/model.ckpt-X where X is the last timestep saved. The load path must precisely match this file name. If this option is turned off, the checkpoint will always overwrite the file specified in path and the model can always be loaded under this path.

Returns:

Checkpoint path were the model was saved.

set_normalized_actions(actions)¶

set_normalized_states(states)¶

should_stop()¶

class tensorforce.agents.VPGAgent(states, actions, network, batched_observe=True, batching_capacity=1000, scope='vpg', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, update_mode=None, memory=None, optimizer=None, discount=0.99, distributions=None, entropy_regularization=None, baseline_mode=None, baseline=None, baseline_optimizer=None, gae_lambda=None)¶

Bases: tensorforce.agents.learning_agent.LearningAgent

Vanilla policy gradient agent (Williams, 1992)).

__init__(states, actions, network, batched_observe=True, batching_capacity=1000, scope='vpg', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, update_mode=None, memory=None, optimizer=None, discount=0.99, distributions=None, entropy_regularization=None, baseline_mode=None, baseline=None, baseline_optimizer=None, gae_lambda=None)¶

Initializes the VPG agent.

Parameters:	update_mode – Update mode specification, with the following attributes:

Parameters:

memory (spec) – Memory specification, see core.memories module for more information (default: {type=’latest’, include_next_states=false, capacity=1000*batch_size}).
optimizer (spec) – Optimizer specification, see core.optimizers module for more information (default: {type=’adam’, learning_rate=1e-3}).
baseline_mode (str) – One of ‘states’, ‘network’ (default: none).
baseline (spec) – Baseline specification, see core.baselines module for more information (default: none).
baseline_optimizer (spec) – Baseline optimizer specification, see core.optimizers module for more information (default: none).
gae_lambda (float) – Lambda factor for generalized advantage estimation (default: none).

act(states, deterministic=False, independent=False, fetch_tensors=None)¶

Return action(s) for given state(s). States preprocessing and exploration are applied if configured accordingly.

Parameters:	states (any) – One state (usually a value tuple) or dict of states if multiple states are expected. deterministic (bool) – If true, no exploration and sampling is applied. independent (bool) – If true, action is not followed by observe (and hence not included in updates). fetch_tensors (list) – Optional String of named tensors to fetch
Returns:	Scalar value of the action or dict of multiple actions the agent wants to execute. (fetched_tensors) Optional dict() with named tensors fetched

close()¶

from_spec(spec, kwargs)¶: Creates an agent from a specification dict.

import_experience(experiences)¶

Imports experiences.

Parameters:	experiences –

initialize_model()¶

last_observation()¶

observe(terminal, reward)¶

Observe experience from the environment to learn from. Optionally pre-processes rewards Child classes should call super to get the processed reward EX: terminal, reward = super()…

Parameters:	terminal (bool) – boolean indicating if the episode terminated after the observation. reward (float) – scalar reward that resulted from executing the action.

reset()¶: Reset the agent to its initial state (e.g. on experiment start). Updates the Model’s internal episode and time step counter, internal states, and resets preprocessors.

restore_model(directory=None, file=None)¶

Restore TensorFlow model. If no checkpoint file is given, the latest checkpoint is restored. If no checkpoint directory is given, the model’s default saver directory is used (unless file specifies the entire path).

Parameters:	directory – Optional checkpoint directory. file – Optional checkpoint file, or path if directory not given.

save_model(directory=None, append_timestep=True)¶

Save TensorFlow model. If no checkpoint directory is given, the model’s default saver directory is used. Optionally appends current timestep to prevent overwriting previous checkpoint files. Turn off to be able to load model from the same given path argument as given here.

Parameters:

directory (str) – Optional checkpoint directory.
append_timestep (bool) – Appends the current timestep to the checkpoint file if true. If this is set to True, the load path must include the checkpoint timestep suffix. For example, if stored to models/ and set to true, the exported file will be of the form models/model.ckpt-X where X is the last timestep saved. The load path must precisely match this file name. If this option is turned off, the checkpoint will always overwrite the file specified in path and the model can always be loaded under this path.

Returns:

Checkpoint path were the model was saved.

set_normalized_actions(actions)¶

set_normalized_states(states)¶

should_stop()¶

class tensorforce.agents.DDPGAgent(states, actions, network, batched_observe=True, batching_capacity=1000, scope='ddpg', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, update_mode=None, memory=None, optimizer=None, discount=0.99, distributions=None, entropy_regularization=None, critic_network=None, critic_optimizer=None, target_sync_frequency=10000, target_update_weight=1.0)¶

Bases: tensorforce.agents.learning_agent.LearningAgent

Deep Deterministic Policy Gradient agent (Lillicrap et al., 2015).

__init__(states, actions, network, batched_observe=True, batching_capacity=1000, scope='ddpg', device=None, saver=None, summarizer=None, execution=None, variable_noise=None, states_preprocessing=None, actions_exploration=None, reward_preprocessing=None, update_mode=None, memory=None, optimizer=None, discount=0.99, distributions=None, entropy_regularization=None, critic_network=None, critic_optimizer=None, target_sync_frequency=10000, target_update_weight=1.0)¶

Initializes the DDPG agent.

Parameters:	update_mode – Update mode specification, with the following attributes:

Parameters:

memory (spec) – Memory specification, see core.memories module for more information (default: {type=’replay’, include_next_states=true, capacity=1000*batch_size}).
optimizer (spec) – Optimizer specification, see core.optimizers module for more information (default: {type=’adam’, learning_rate=1e-3}).
critic_network (spec) – Critic network specification, usually a list of layer specifications, see core.networks module for more information (default: network).
critic_optimizer (spec) – Critic optimizer specification, see core.optimizers module for more information (default: {type=’adam’, learning_rate=1e-3}).
target_sync_frequency (int) – Target network sync frequency (default: 10000).
target_update_weight (float) – Target network update weight (default: 1.0).

act(states, deterministic=False, independent=False, fetch_tensors=None)¶

Return action(s) for given state(s). States preprocessing and exploration are applied if configured accordingly.

Parameters:	states (any) – One state (usually a value tuple) or dict of states if multiple states are expected. deterministic (bool) – If true, no exploration and sampling is applied. independent (bool) – If true, action is not followed by observe (and hence not included in updates). fetch_tensors (list) – Optional String of named tensors to fetch
Returns:	Scalar value of the action or dict of multiple actions the agent wants to execute. (fetched_tensors) Optional dict() with named tensors fetched

close()¶

from_spec(spec, kwargs)¶: Creates an agent from a specification dict.

import_experience(experiences)¶

Imports experiences.

Parameters:	experiences –

initialize_model()¶

last_observation()¶

observe(terminal, reward)¶

Observe experience from the environment to learn from. Optionally pre-processes rewards Child classes should call super to get the processed reward EX: terminal, reward = super()…

Parameters:	terminal (bool) – boolean indicating if the episode terminated after the observation. reward (float) – scalar reward that resulted from executing the action.

reset()¶: Reset the agent to its initial state (e.g. on experiment start). Updates the Model’s internal episode and time step counter, internal states, and resets preprocessors.

restore_model(directory=None, file=None)¶

Restore TensorFlow model. If no checkpoint file is given, the latest checkpoint is restored. If no checkpoint directory is given, the model’s default saver directory is used (unless file specifies the entire path).

Parameters:	directory – Optional checkpoint directory. file – Optional checkpoint file, or path if directory not given.

save_model(directory=None, append_timestep=True)¶

Save TensorFlow model. If no checkpoint directory is given, the model’s default saver directory is used. Optionally appends current timestep to prevent overwriting previous checkpoint files. Turn off to be able to load model from the same given path argument as given here.

Parameters:

directory (str) – Optional checkpoint directory.
append_timestep (bool) – Appends the current timestep to the checkpoint file if true. If this is set to True, the load path must include the checkpoint timestep suffix. For example, if stored to models/ and set to true, the exported file will be of the form models/model.ckpt-X where X is the last timestep saved. The load path must precisely match this file name. If this option is turned off, the checkpoint will always overwrite the file specified in path and the model can always be loaded under this path.

Returns:

Checkpoint path were the model was saved.

set_normalized_actions(actions)¶

set_normalized_states(states)¶

should_stop()¶