Tensorforce: a TensorFlow library for applied reinforcement learning

Tensorforce is an open-source deep reinforcement learning framework, with an emphasis on modularized flexible library design and straightforward usability for applications in research and practice. Tensorforce is built on top of Google’s TensorFlow framework and compatible with Python 3 (Python 2 support was dropped with version 0.5).

Tensorforce follows a set of high-level design choices which differentiate it from other similar libraries:

  • Modular component-based design: Feature implementations, above all, strive to be as generally applicable and configurable as possible, potentially at some cost of faithfully resembling details of the introducing paper.
  • Separation of RL algorithm and application: Algorithms are agnostic to the type and structure of inputs (states/observations) and outputs (actions/decisions), as well as the interaction with the application environment.
  • Full-on TensorFlow models: The entire reinforcement learning logic, including control flow, is implemented in TensorFlow, to enable portable computation graphs independent of application programming language, and to facilitate the deployment of models.

Installation

A stable version of Tensorforce is periodically updated on PyPI and installed as follows:

pip install tensorforce

To always use the latest version of Tensorforce, install the GitHub version instead:

git clone https://github.com/tensorforce/tensorforce.git
cd tensorforce
pip install -e .

Tensorforce is built on top of Google’s TensorFlow and requires that either tensorflow or tensorflow-gpu is installed, currently as version 1.13.1. To include the correct version of TensorFlow with the installation of Tensorforce, simply add the flag tf for the normal CPU version or tf_gpu for the GPU version:

# PyPI version plus TensorFlow CPU version
pip install tensorforce[tf]

# GitHub version plus TensorFlow GPU version
pip install -e .[tf_gpu]

Some environments require additional packages, for which there are also options available (mazeexp, gym, retro, vizdoom; or envs for all environments), however, some require other tools to be installed (see environments documentation).

Getting started

Training

from tensorforce.agents import Agent
from tensorforce.environments import Environment

# Setup environment
# (Tensorforce or custom implementation, ideally using the Environment interface)
environment = Environment.create(environment='environment.json')

# Create and initialize agent
agent = Agent.create(agent='agent.json', environment=environment)
agent.initialize()

# Reset agent and environment at the beginning of a new episode
agent.reset()
states = environment.reset()
terminal = False

# Agent-environment interaction training loop
while not terminal:
    actions = agent.act(states=states)
    states, terminal, reward = environment.execute(actions=actions)
    agent.observe(terminal=terminal, reward=reward)

# Close agent and environment
agent.close()
environment.close()

Evaluation / application

# Agent-environment interaction evaluation loop
while not terminal:
    actions = agent.act(states=states, evaluation=True)
    states, terminal, reward = environment.execute(actions=actions)

Runner utility

from tensorforce.execution import Runner

# Tensorforce runner utility
runner = Runner(agent='agent.json', environment='environment.json')

# Run training
runner.run(num_episodes=500)

# Close runner
runner.close()

Module specification

Agents are instantiated via Agent.create(agent=...), with either of the specification alternatives presented below (agent acts as type argument). It is recommended to pass as second argument environment the application Environment implementation, which automatically extracts the corresponding states, actions and max_episode_timesteps arguments of the agent.

How to specify modules

Dictionary with module type and arguments

Agent.create(...
    network=dict(type='layered', layers=[dict(type='dense', size=32)]),
    memory=dict(type='replay', capacity=10000), ...
)

JSON specification file (plus additional arguments)

Agent.create(...
    network='network.json',
    memory=dict(type='memory.json', capacity=10000), ...
)

Module path (plus additional arguments)

Agent.create(...
    network='my_module.TestNetwork',
    memory=dict(type='tensorforce.core.memories.Replay', capacity=10000), ...
)

Callable or Type (plus additional arguments)

Agent.create(...
    network=TestNetwork,
    memory=dict(type=Replay, capacity=10000), ...
)

Default module: only arguments or first argument

Agent.create(...
    network=[dict(type='dense', size=32)],
    memory=dict(capacity=10000), ...
)

Static vs dynamic hyperparameters

Tensorforce distinguishes between agent/module arguments (primitive types: bool/int/long/float) which specify either part of the TensorFlow model architecture, like the layer size, or a value within the architecture, like the learning rate. Whereas the former are statically defined as part of the agent initialization, the latter can be dynamically adjusted afterwards. These dynamic hyperparameters are indicated by parameter as part of their type specification in the documentation, and can alternatively be assigned a parameter module instead of a constant value, for instance, to specify a decaying learning rate.

run.py – Runner

Required arguments

#1: agent (string) – Agent (configuration JSON file, name, or library module)
#2: environment (string) – Environment (name, configuration JSON file, or library module)

Optional arguments

Agent arguments

--[n]etwork (string, default: not specified) – Network (configuration JSON file, name, or library module)

Environment arguments

--[l]evel (string, default: not specified) – Level or game id, like CartPole-v1, if supported
--[i]mport-modules (string, default: not specified) – Import comma-separated modules required for environment
--visualize (bool, default: false) – Visualize agent–environment interaction, if supported

Runner arguments

--[t]imesteps (int, default: not specified) – Number of timesteps
--[e]pisodes (int, default: not specified) – Number of episodes
--[m]ax-episode-timesteps (int, default: not specified) – Maximum number of timesteps per episode
--mean-horizon (int, default: 10) – Number of timesteps/episodes for mean reward computation
--e[v]aluation (bool, default: false) – Evaluation mode
--[s]ave-best-agent (bool, default: false) – Save best-performing agent

Logging arguments

--[r]epeat (int, default: 1) – Number of repetitions
--[p]ath (string, default: not specified) – Logging path, directory plus filename without extension

--seaborn (bool, default: false) – Use seaborn

tune.py – Hyperparameter tuner

Required arguments

#1: environment (string) – Environment (name, configuration JSON file, or library module)

Optional arguments

--[l]evel (string, default: not specified) – Level or game id, like CartPole-v1, if supported
--[m]ax-repeats (int, default: 1) – Maximum number of repetitions
--[n]um-iterations (int, default: 1) – Number of BOHB iterations
--[d]irectory (string, default: “tuner”) – Output directory
--[r]estore (string, default: not specified) – Restore from given directory
--id (string, default: “worker”) – Unique worker id

Agent interface

class tensorforce.agents.Agent(states, actions, max_episode_timesteps=None, parallel_interactions=1, buffer_observe=True, seed=None, recorder=None)[source]

Tensorforce agent interface.

act(states, parallel=0, deterministic=False, independent=False, evaluation=False, query=None, **kwargs)[source]

Returns action(s) for the given state(s), needs to be followed by observe(...) unless independent is true.

Parameters:
  • states (dict[state]) – Dictionary containing state(s) to be acted on (required).
  • parallel (int) – Parallel execution index (default: 0).
  • deterministic (bool) – Whether to apply exploration and sampling (default: false).
  • independent (bool) – Whether action is not remembered, and this call is thus not followed by observe (default: false).
  • evaluation (bool) – Whether the agent is currently evaluated, implies and overwrites deterministic and independent (default: false).
  • query (list[str]) – Names of tensors to retrieve (default: none).
  • kwargs – Additional input values, for instance, for dynamic hyperparameters.
Returns:

Dictionary containing action(s), plus queried tensor values if requested.

Return type:

(dict[action], plus optional list[str])

close()[source]

Closes the agent.

static create(agent=None, environment=None, **kwargs)[source]

Creates an agent from a specification.

Parameters:
  • agent (specification) – JSON file, specification key, configuration dictionary, library module, or Agent subclass (default: Policy agent).
  • environment (Environment) – Environment which the agent is supposed to be trained on, environment-related arguments like state/action space specifications will be extract if given.
  • kwargs – Additional arguments.
get_available_summaries()[source]

Returns the summary labels provided by the agent.

Returns:Available summary labels.
Return type:list[str]
get_output_tensors(function)[source]

Returns the names of output tensors for the given function.

Parameters:function (str) – Function name (required).
Returns:Names of output tensors.
Return type:list[str]
get_query_tensors(function)[source]

Returns the names of queryable tensors for the given function.

Parameters:function (str) – Function name (required).
Returns:Names of queryable tensors.
Return type:list[str]
initialize()[source]

Initializes the agent.

observe(reward, terminal=False, parallel=0, query=None, **kwargs)[source]

Observes reward and whether a terminal state is reached, needs to be preceded by act(...).

Parameters:
  • reward (float) – Reward (required).
  • terminal (bool | 0 | 1 | 2) – Whether a terminal state is reached or 2 if the episode was aborted (default: false).
  • parallel (int) – Parallel execution index (default: 0).
  • query (list[str]) – Names of tensors to retrieve (default: none).
  • kwargs – Additional input values, for instance, for dynamic hyperparameters.
Returns:

Whether an update was performed, plus queried tensor values if requested.

Return type:

(bool, optional list[str])

reset()[source]

Resets the agent to start a new episode.

restore(directory=None, filename=None)[source]

Restores the agent.

Parameters:
  • directory (str) – Checkpoint directory (default: directory specified for TensorFlow saver).
  • filename (str) – Checkpoint filename (default: latest checkpoint in directory).
save(directory=None, filename=None, append_timestep=True)[source]

Saves the current state of the agent.

Parameters:
  • directory (str) – Checkpoint directory (default: directory specified for TensorFlow saver).
  • filename (str) – Checkpoint filename (default: filename specified for TensorFlow saver).
  • append_timestep – Whether to append the current timestep to the checkpoint file (default: true).
Returns:

Checkpoint path.

Return type:

str

Constant Agent

class tensorforce.agents.ConstantAgent(states, actions, max_episode_timesteps=None, action_values=None, name='agent', device=None, summarizer=None, seed=None, recorder=None)[source]

Agent returning constant action values (specification key: constant).

Parameters:
  • states (specification) – States specification (required), arbitrarily nested dictionary of state descriptions (usually taken from Environment.states()) with the following attributes:
    • type ('bool' | 'int' | 'float') – state data type (default: 'float').
    • shape (int | iter[int]) – state shape (required).
    • num_states (int > 0) – number of discrete state values (required for type 'int').
    • min_value/max_value (float) – minimum/maximum state value (optional for type 'float').
  • actions (specification) – Actions specification (required), arbitrarily nested dictionary of action descriptions (usually taken from Environment.actions()) with the following attributes:
    • type ('bool' | 'int' | 'float') – action data type (required).
    • shape (int > 0 | iter[int > 0]) – action shape (default: ()).
    • num_actions (int > 0) – number of discrete action values (required for type 'int').
    • min_value/max_value (float) – minimum/maximum action value (optional for type 'float').
  • max_episode_timesteps (int > 0) –

    ?

  • action_values (dict[value]) – Constant value per action (default: false for binary boolean actions, 0 for discrete integer actions, 0.0 for continuous actions).
  • seed (int) – Random seed to set for Python, NumPy and TensorFlow (default: none).
  • name (string) – Agent name, used e.g. for TensorFlow scopes (default: “agent”).
  • device (string) – Device name (default: TensorFlow default).
  • summarizer (specification) – TensorBoard summarizer configuration with the following attributes (default: no summarizer):
    • directory (path) – summarizer directory (required).
    • steps (int > 0, dict[int > 0]) – how frequently to record summaries, applies to "variables" and "act" if specified globally (default: always), otherwise specified per "variables"/"act" in timesteps and "observe"/"update" in updates (default: never).
    • flush (int > 0) – how frequently in seconds to flush the summary writer (default: 10).
    • labels ("all" | iter[string]) – all or list of summaries to record, from the following labels (default: only "graph"):
    • "graph": graph summary
    • "parameters": parameter scalars

Random Agent

class tensorforce.agents.RandomAgent(states, actions, max_episode_timesteps=None, name='agent', device=None, summarizer=None, seed=None, recorder=None)[source]

Agent returning random action values (specification key: random).

Parameters:
  • states (specification) – States specification (required), arbitrarily nested dictionary of state descriptions (usually taken from Environment.states()) with the following attributes:
    • type ('bool' | 'int' | 'float') – state data type (default: 'float').
    • shape (int | iter[int]) – state shape (required).
    • num_states (int > 0) – number of discrete state values (required for type 'int').
    • min_value/max_value (float) – minimum/maximum state value (optional for type 'float').
  • actions (specification) – Actions specification (required), arbitrarily nested dictionary of action descriptions (usually taken from Environment.actions()) with the following attributes:
    • type ('bool' | 'int' | 'float') – action data type (required).
    • shape (int > 0 | iter[int > 0]) – action shape (default: ()).
    • num_actions (int > 0) – number of discrete action values (required for type 'int').
    • min_value/max_value (float) – minimum/maximum action value (optional for type 'float').
  • max_episode_timesteps (int > 0) –

    ?

  • seed (int) – Random seed to set for Python, NumPy and TensorFlow (default: none).
  • name (string) – Agent name, used e.g. for TensorFlow scopes (default: “agent”).
  • device (string) – Device name (default: TensorFlow default).
  • summarizer (specification) – TensorBoard summarizer configuration with the following attributes (default: no summarizer):
    • directory (path) – summarizer directory (required).
    • steps (int > 0, dict[int > 0]) – how frequently to record summaries, applies to "variables" and "act" if specified globally (default: always), otherwise specified per "variables"/"act" in timesteps and "observe"/"update" in updates (default: never).
    • flush (int > 0) – how frequently in seconds to flush the summary writer (default: 10).
    • labels ("all" | iter[string]) – all or list of summaries to record, from the following labels (default: only "graph"):
    • "graph": graph summary
    • "parameters": parameter scalars

Tensorforce Policy Agent

class tensorforce.agents.PolicyAgent(states, actions, update, objective, reward_estimation, max_episode_timesteps=None, policy=None, network='auto', memory=None, optimizer='adam', baseline_policy=None, baseline_network=None, baseline_optimizer=None, baseline_objective=None, preprocessing=None, exploration=0.0, variable_noise=0.0, l2_regularization=0.0, entropy_regularization=0.0, name='agent', device=None, parallel_interactions=1, buffer_observe=True, seed=None, execution=None, saver=None, summarizer=None, recorder=None)[source]

Policy Agent (specification key: policy).

Base class for a broad class of deep reinforcement learning agents, which act according to a policy parametrized by a neural network, leverage a memory module for periodic updates based on batches of experience, and optionally employ a baseline/critic/target policy for improved reward estimation.

Parameters:
  • states (specification) – States specification (required), arbitrarily nested dictionary of state descriptions (usually taken from Environment.states()) with the following attributes:
    • type ("bool" | "int" | "float") – state data type (default: "float").
    • shape (int | iter[int]) – state shape (required).
    • num_states (int > 0) – number of discrete state values (required for type "int").
    • min_value/max_value (float) – minimum/maximum state value (optional for type "float").
  • actions (specification) – Actions specification (required), arbitrarily nested dictionary of action descriptions (usually taken from Environment.actions()) with the following attributes:
    • type ("bool" | "int" | "float") – action data type (required).
    • shape (int > 0 | iter[int > 0]) – action shape (default: scalar).
    • num_actions (int > 0) – number of discrete action values (required for type "int").
    • min_value/max_value (float) – minimum/maximum action value (optional for type "float").
  • max_episode_timesteps (int > 0) – Maximum number of timesteps per episode (default: not given).
  • policy (specification) – Policy configuration, currently best to ignore and use the network argument instead.
  • network ("auto" | specification) – Policy network configuration, see networks (default: “auto”, automatically configured network).
  • memory (int | specification) – Memory configuration, see memories (default: replay memory with given or inferred capacity).
  • update (int | specification) – Model update configuration with the following attributes (required, default: timesteps batch size</span>):
    • unit ("timesteps" | "episodes") – unit for update attributes (required).
    • batch_size (parameter, long > 0) – size of update batch in number of units (required).
    • frequency ("never" | parameter, long > 0) – frequency of updates (default: batch_size).
    • start (parameter, long >= 2 * batch_size) – number of units before first update (default: 0).
  • optimizer (specification) – Optimizer configuration, see optimizers (default: Adam optimizer).
  • objective (specification) – Optimization objective configuration, see objectives (required).
  • reward_estimation (specification) – Reward estimation configuration with the following attributes (required):
    • horizon ("episode" | parameter, long >= 0) – Horizon of discounted-sum reward estimation (required).
    • discount (parameter, 0.0 <= float <= 1.0) – Discount factor for future rewards of discounted-sum reward estimation (default: 1.0).
    • estimate_horizon (false | "early" | "late") – Whether to estimate the value of horizon states, and if so, whether to estimate early when experience is stored, or late when it is retrieved (default: "late").
    • estimate_actions (bool) – Whether to estimate state-action values instead of state values (default: false).
    • estimate_terminal (bool) – Whether to estimate the value of terminal states (default: false).
    • estimate_advantage (bool) – Whether to estimate the advantage by subtracting the current estimate (default: false).
  • baseline_policy ("same" | "equal" | specification) – Baseline policy configuration, “same” refers to reusing the main policy as baseline, “equal” refers to using the same configuration as the main policy (default: none).
  • baseline_network ("same" | "equal" | specification) –

    Baseline network configuration, see networks, “same” refers to reusing the main network as part of the baseline policy, “equal” refers to using the same configuration as the main network (default: none).

  • baseline_optimizer ("same" | "equal" | specification) –

    Baseline optimizer configuration, see optimizers, “same” refers to reusing the main optimizer for the baseline, “equal” refers to using the same configuration as the main optimizer (default: none).

  • baseline_objective ("same" | "equal" | specification) –

    Baseline optimization objective configuration, see objectives, “same” refers to reusing the main objective for the baseline, “equal” refers to using the same configuration as the main objective (default: none).

  • preprocessing (dict[specification]) – Preprocessing as layer or list of layers, see preprocessing, specified per state-type or -name and for reward (default: none).
  • exploration (parameter | dict[parameter], float >= 0.0) – Exploration, global or per action, defined as the probability for uniformly random output in case of bool and int actions, and the standard deviation of Gaussian noise added to every output in case of float actions (default: 0.0).
  • variable_noise (parameter, float >= 0.0) – Standard deviation of Gaussian noise added to all trainable float variables (default: 0.0).
  • l2_regularization (parameter, float >= 0.0) – Scalar controlling L2 regularization (default: 0.0).
  • entropy_regularization (parameter, float >= 0.0) – Scalar controlling entropy regularization, to discourage the policy distribution being too “certain” / spiked (default: 0.0).
  • name (string) – Agent name, used e.g. for TensorFlow scopes (default: “agent”).
  • device (string) – Device name (default: TensorFlow default).
  • parallel_interactions (int > 0) – Maximum number of parallel interactions to support, for instance, to enable multiple parallel episodes, environments or (centrally controlled) agents within an environment (default: 1).
  • buffer_observe (bool | int > 0) – Maximum number of timesteps within an episode to buffer before executing internal observe operations, to reduce calls to TensorFlow for improved performance (default: max_episode_timesteps or 1000, unless summarizer specified).
  • seed (int) – Random seed to set for Python, NumPy and TensorFlow (default: none).
  • execution (specification) – TensorFlow execution configuration with the following attributes (default: standard): …
  • saver (specification) – TensorFlow saver configuration with the following attributes (default: no saver):
    • directory (path) – saver directory (required).
    • filename (string) – model filename (default: "model").
    • frequency (int > 0) – how frequently in seconds to save the model (default: 600 seconds).
    • load (bool | str) – whether to load the existing model, or which model filename to load (default: true).
  • max-checkpoints (int > 0) – maximum number of checkpoints to keep (default: 5).
  • summarizer (specification) – TensorBoard summarizer configuration with the following attributes (default: no summarizer):
    • directory (path) – summarizer directory (required).
    • frequency (int > 0, dict[int > 0]) – how frequently in timestepsto record summaries, applies to "variables" and "act" if specified globally (default: always), otherwise specified per "variables"/"act" in timesteps and "observe"/"update" in updates (default: never).
    • flush (int > 0) – how frequently in seconds to flush the summary writer (default: 10).
    • max-summaries (int > 0) – maximum number of summaries to keep (default: 5).
    • labels ("all" | iter[string]) – all or list of summaries to record, from the following labels (default: only "graph"):
    • "distributions" or "bernoulli", "categorical", "gaussian", "beta": distribution-specific parameters
    • "dropout": dropout zero fraction
    • "entropy": entropy of policy distribution
    • "graph": graph summary
    • "kl-divergence": KL-divergence of previous and updated policy distribution
    • "losses" or "loss", "objective-loss", "regularization-loss", "baseline-loss", "baseline-objective-loss", "baseline-regularization-loss": loss scalars
    • "parameters": parameter scalars
    • "relu": ReLU activation zero fraction
    • "rewards" or "timestep-reward", "episode-reward", "raw-reward", "processed-reward", "estimated-reward": reward scalar
    • "update-norm": update norm
    • "updates": update mean and variance scalars
    • "updates-full": update histograms
    • "variables": variable mean and variance scalars
    • "variables-full": variable histograms
  • recorder (specification) – Experience traces recorder configuration with the following attributes (default: no recorder):
    • directory (path) – recorder directory (required).
    • frequency (int > 0) – how frequently in episodes to record traces (default: every episode).
    • max-traces (int > 0) – maximum number of traces to keep (default: all).

Deep Q-Network

class tensorforce.agents.DeepQNetwork(states, actions, max_episode_timesteps=None, network='auto', memory=10000, batch_size=32, update_frequency=4, start_updating=None, learning_rate=0.0003, huber_loss=0.0, n_step=0, discount=0.99, estimate_terminal=False, target_sync_frequency=10000, target_update_weight=1.0, preprocessing=None, exploration=0.0, variable_noise=0.0, l2_regularization=0.0, entropy_regularization=0.0, name='agent', device=None, parallel_interactions=1, seed=None, execution=None, saver=None, summarizer=None, recorder=None)[source]

Deep Q-Network agent (specification key: dqn).

Vanilla Policy Gradient

class tensorforce.agents.VanillaPolicyGradient(states, actions, max_episode_timesteps, network='auto', batch_size=10, update_frequency=None, learning_rate=0.0003, discount=0.99, estimate_terminal=False, critic_network=None, critic_optimizer=None, preprocessing=None, exploration=0.0, variable_noise=0.0, l2_regularization=0.0, entropy_regularization=0.0, name='agent', device=None, parallel_interactions=1, seed=None, execution=None, saver=None, summarizer=None, recorder=None)[source]

Vanilla Policy Gradient agent (specification key: vpg).

Proximal Policy Optimization

class tensorforce.agents.ProximalPolicyOptimization(states, actions, max_episode_timesteps, network='auto', batch_size=10, update_frequency=None, learning_rate=0.0003, subsampling_fraction=0.33, optimization_steps=10, likelihood_ratio_clipping=0.2, discount=0.99, estimate_terminal=False, critic_network=None, critic_optimizer=None, preprocessing=None, exploration=0.0, variable_noise=0.0, l2_regularization=0.0, entropy_regularization=0.0, name='agent', device=None, parallel_interactions=1, seed=None, execution=None, saver=None, summarizer=None, recorder=None)[source]

Proximal Policy Optimization agent (specification key: ppo).

Trust-Region Policy Optimization

class tensorforce.agents.TrustRegionPolicyOptimization(states, actions, max_episode_timesteps, network='auto', batch_size=10, update_frequency=None, learning_rate=0.001, likelihood_ratio_clipping=0.2, discount=0.99, estimate_terminal=False, critic_network=None, critic_optimizer=None, preprocessing=None, exploration=0.0, variable_noise=0.0, l2_regularization=0.0, entropy_regularization=0.0, name='agent', device=None, parallel_interactions=1, seed=None, execution=None, saver=None, summarizer=None, recorder=None)[source]

Trust Region Policy Optimization agent (specification key: trpo).

Distributions

class tensorforce.core.distributions.Bernoulli(name, action_spec, embedding_size, summary_labels=None)[source]

Bernoulli distribution, for binary boolean actions (specification key: bernoulli).

Parameters:
  • name (string) – Distribution name (internal use).
  • action_spec (specification) – Action specification (internal use).
  • embedding_size (int > 0) – Embedding size (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.distributions.Beta(name, action_spec, embedding_size, summary_labels=None)[source]

Beta distribution, for bounded continuous actions (specification key: beta).

Parameters:
  • name (string) – Distribution name (internal use).
  • action_spec (specification) – Action specification (internal use).
  • embedding_size (int > 0) – Embedding size (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.distributions.Categorical(name, action_spec, embedding_size, summary_labels=None)[source]

Categorical distribution, for discrete integer actions (specification key: categorical).

Parameters:
  • name (string) – Distribution name (internal use).
  • action_spec (specification) – Action specification (internal use).
  • embedding_size (int > 0) – Embedding size (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.distributions.Gaussian(name, action_spec, embedding_size, summary_labels=None)[source]

Gaussian distribution, for unbounded continuous actions (specification key: gaussian).

Parameters:
  • name (string) – Distribution name (internal use).
  • action_spec (specification) – Action specification (internal use).
  • embedding_size (int > 0) – Embedding size (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).

Layers

Default layer: Function with default argument function

Convolutional layers

class tensorforce.core.layers.Conv1d(name, size, window=3, stride=1, padding='same', bias=True, activation='relu', dropout=0.0, is_trainable=True, use_cudnn_on_gpu=True, input_spec=None, summary_labels=None, l2_regularization=None)[source]

1-dimensional convolutional layer (specification key: conv1d).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • size (int >= 0) – Layer output size, 0 implies additionally removing the axis (required).
  • window (int > 0) – Window size (default: 3).
  • stride (int > 0) – Stride size (default: 1).
  • padding ('same' | 'valid') – Padding type, see TensorFlow docs (default: ‘same’).
  • bias (bool) – Whether to add a trainable bias variable (default: true).
  • ('crelu' | 'elu' | 'leaky-relu' | 'none' | 'relu' | 'selu' | 'sigmoid' | (activation) – ‘softmax’ | ‘softplus’ | ‘softsign’ | ‘swish’ | ‘tanh’): Activation nonlinearity (default: “relu”).
  • dropout (parameter, 0.0 <= float < 1.0) – Dropout rate (default: 0.0).
  • is_trainable (bool) – Whether layer variables are trainable (default: true).
  • use_cudnn_on_gpu (bool) – Whether to use cuDNN on GPU (default: true).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
  • l2_regularization (float >= 0.0) – Scalar controlling L2 regularization (default: inherit value of parent module).
class tensorforce.core.layers.Conv2d(name, size, window=3, stride=1, padding='same', dilation=1, bias=True, activation='relu', dropout=0.0, is_trainable=True, use_cudnn_on_gpu=True, input_spec=None, summary_labels=None, l2_regularization=None)[source]

2-dimensional convolutional layer (specification key: conv2d).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • size (int >= 0) – Layer output size, 0 implies additionally removing the axis (required).
  • window (int > 0 | (int > 0, int > 0)) – Window size (default: 3).
  • stride (int > 0 | (int > 0, int > 0)) – Stride size (default: 1).
  • padding ('same' | 'valid') – Padding type, see TensorFlow docs (default: ‘same’).
  • dilation (int > 0 | (int > 0, int > 0)) – Dilation value (default: 1).
  • bias (bool) – Whether to add a trainable bias variable (default: true).
  • ('crelu' | 'elu' | 'leaky-relu' | 'none' | 'relu' | 'selu' | 'sigmoid' | (activation) – ‘softmax’ | ‘softplus’ | ‘softsign’ | ‘swish’ | ‘tanh’): Activation nonlinearity (default: “relu”).
  • dropout (parameter, 0.0 <= float < 1.0) – Dropout rate (default: 0.0).
  • is_trainable (bool) – Whether layer variables are trainable (default: true).
  • use_cudnn_on_gpu (bool) – Whether to use cuDNN on GPU (default: true).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
  • l2_regularization (float >= 0.0) – Scalar controlling L2 regularization (default: inherit value of parent module).

Dense layers

class tensorforce.core.layers.Dense(name, size, bias=True, activation='relu', dropout=0.0, is_trainable=True, input_spec=None, summary_labels=None, l2_regularization=None)[source]

Dense fully-connected layer (specification key: dense).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • size (int >= 0) – Layer output size, 0 implies additionally removing the axis (required).
  • bias (bool) – Whether to add a trainable bias variable (default: true).
  • ('crelu' | 'elu' | 'leaky-relu' | 'none' | 'relu' | 'selu' | 'sigmoid' | (activation) – ‘softmax’ | ‘softplus’ | ‘softsign’ | ‘swish’ | ‘tanh’): Activation nonlinearity (default: “relu”).
  • dropout (parameter, 0.0 <= float < 1.0) – Dropout rate (default: 0.0).
  • is_trainable (bool) – Whether layer variables are trainable (default: true).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
  • l2_regularization (float >= 0.0) – Scalar controlling L2 regularization (default: inherit value of parent module).
class tensorforce.core.layers.Linear(name, size, bias=True, is_trainable=True, input_spec=None, summary_labels=None, l2_regularization=None)[source]

Linear layer (specification key: linear).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • size (int >= 0) – Layer output size, 0 implies additionally removing the axis (required).
  • bias (bool) – Whether to add a trainable bias variable (default: true).
  • is_trainable (bool) – Whether layer variables are trainable (default: true).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
  • l2_regularization (float >= 0.0) – Scalar controlling L2 regularization (default: inherit value of parent module).

Embedding layers

class tensorforce.core.layers.Embedding(name, size, num_embeddings=None, partition_strategy='mod', max_norm=None, bias=False, activation='tanh', dropout=0.0, is_trainable=True, input_spec=None, summary_labels=None, l2_regularization=None)[source]

Embedding layer (specification key: embedding).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • size (int >= 0) – Layer output size, 0 implies additionally removing the axis (required).
  • num_embeddings (int > 0) – If set, specifies the number of embeddings (default: none).
  • partition_strategy ('mod' | 'div') – Partitioning strategy, see TensorFlow docs (default: ‘mod’).
  • max_norm (float) – If set, embeddings are clipped if their L2-norm is larger (default: none).
  • bias (bool) – Whether to add a trainable bias variable (default: false).
  • ('crelu' | 'elu' | 'leaky-relu' | 'none' | 'relu' | 'selu' | 'sigmoid' | (activation) – ‘softmax’ | ‘softplus’ | ‘softsign’ | ‘swish’ | ‘tanh’): Activation nonlinearity (default: “tanh”).
  • dropout (parameter, 0.0 <= float < 1.0) – Dropout rate (default: 0.0).
  • is_trainable (bool) – Whether layer variables are trainable (default: true).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
  • l2_regularization (float >= 0.0) – Scalar controlling L2 regularization (default: inherit value of parent module).
  • kwargs – Additional arguments for potential parent class.

Recurrent layers

class tensorforce.core.layers.Gru(name, size, return_final_state=True, bias=False, activation=None, dropout=0.0, is_trainable=True, input_spec=None, summary_labels=None, l2_regularization=None, **kwargs)[source]

Gated recurrent unit layer (specification key: gru).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • cell ('gru' | 'lstm') – The recurrent cell type (required).
  • size (int >= 0) – Layer output size, 0 implies additionally removing the axis (required).
  • return_final_state (bool) – Whether to return the final state instead of the per-step outputs (default: true).
  • bias (bool) – Whether to add a trainable bias variable (default: false).
  • ('crelu' | 'elu' | 'leaky-relu' | 'none' | 'relu' | 'selu' | 'sigmoid' | (activation) – ‘softmax’ | ‘softplus’ | ‘softsign’ | ‘swish’ | ‘tanh’): Activation nonlinearity (default: none).
  • dropout (parameter, 0.0 <= float < 1.0) – Dropout rate (default: 0.0).
  • is_trainable (bool) – Whether layer variables are trainable (default: true).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
  • l2_regularization (float >= 0.0) – Scalar controlling L2 regularization (default: inherit value of parent module).
  • kwargs – Additional arguments for Keras GRU layer, see TensorFlow docs.
class tensorforce.core.layers.Lstm(name, size, return_final_state=True, bias=False, activation=None, dropout=0.0, is_trainable=True, input_spec=None, summary_labels=None, l2_regularization=None, **kwargs)[source]

Long short-term memory layer (specification key: lstm).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • cell ('gru' | 'lstm') – The recurrent cell type (required).
  • size (int >= 0) – Layer output size, 0 implies additionally removing the axis (required).
  • return_final_state (bool) – Whether to return the final state instead of the per-step outputs (default: true).
  • bias (bool) – Whether to add a trainable bias variable (default: false).
  • ('crelu' | 'elu' | 'leaky-relu' | 'none' | 'relu' | 'selu' | 'sigmoid' | (activation) – ‘softmax’ | ‘softplus’ | ‘softsign’ | ‘swish’ | ‘tanh’): Activation nonlinearity (default: none).
  • dropout (parameter, 0.0 <= float < 1.0) – Dropout rate (default: 0.0).
  • is_trainable (bool) – Whether layer variables are trainable (default: true).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
  • l2_regularization (float >= 0.0) – Scalar controlling L2 regularization (default: inherit value of parent module).
  • kwargs – Additional arguments for Keras LSTM layer, see TensorFlow docs.
class tensorforce.core.layers.Rnn(name, cell, size, return_final_state=True, bias=False, activation=None, dropout=0.0, is_trainable=True, input_spec=None, summary_labels=None, l2_regularization=None, **kwargs)[source]

Recurrent neural network layer (specification key: rnn).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • cell ('gru' | 'lstm') – The recurrent cell type (required).
  • size (int >= 0) – Layer output size, 0 implies additionally removing the axis (required).
  • return_final_state (bool) – Whether to return the final state instead of the per-step outputs (default: true).
  • bias (bool) – Whether to add a trainable bias variable (default: false).
  • ('crelu' | 'elu' | 'leaky-relu' | 'none' | 'relu' | 'selu' | 'sigmoid' | (activation) – ‘softmax’ | ‘softplus’ | ‘softsign’ | ‘swish’ | ‘tanh’): Activation nonlinearity (default: none).
  • dropout (parameter, 0.0 <= float < 1.0) – Dropout rate (default: 0.0).
  • is_trainable (bool) – Whether layer variables are trainable (default: true).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
  • l2_regularization (float >= 0.0) – Scalar controlling L2 regularization (default: inherit value of parent module).
  • kwargs – Additional arguments for Keras RNN layer, see TensorFlow docs.

Pooling layers

class tensorforce.core.layers.Flatten(name, input_spec=None, summary_labels=None)[source]

Flatten layer (specification key: flatten).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.layers.Pooling(name, reduction, input_spec=None, summary_labels=None)[source]

Pooling layer (global pooling) (specification key: pooling).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • reduction ('concat' | 'max' | 'mean' | 'product' | 'sum') – Pooling type (required).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.layers.Pool1d(name, reduction, window=2, stride=2, padding='same', input_spec=None, summary_labels=None)[source]

1-dimensional pooling layer (local pooling) (specification key: pool1d).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • reduction ('average' | 'max') – Pooling type (required).
  • window (int > 0) – Window size (default: 2).
  • stride (int > 0) – Stride size (default: 2).
  • padding ('same' | 'valid') – Padding type, see TensorFlow docs (default: ‘same’).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.layers.Pool2d(name, reduction, window=2, stride=2, padding='same', input_spec=None, summary_labels=None)[source]

2-dimensional pooling layer (local pooling) (specification key: pool2d).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • reduction ('average' | 'max') – Pooling type (required).
  • window (int > 0 | (int > 0, int > 0)) – Window size (default: 2).
  • stride (int > 0 | (int > 0, int > 0)) – Stride size (default: 2).
  • padding ('same' | 'valid') – Padding type, see TensorFlow docs (default: ‘same’).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).

Normalization layers

class tensorforce.core.layers.ExponentialNormalization(name, decay=0.999, axes=None, input_spec=None, summary_labels=None)[source]

Normalization layer based on the exponential moving average (specification key: exponential_normalization).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • decay (parameter, 0.0 <= float <= 1.0) – Decay rate (default: 0.999).
  • axes (iter[int >= 0]) – Normalization axes, excluding batch axis (default: all but last axis).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
  • l2_regularization (float >= 0.0) – Scalar controlling L2 regularization (default: inherit value of parent module).
class tensorforce.core.layers.InstanceNormalization(name, axes=None, input_spec=None, summary_labels=None)[source]

Instance normalization layer (specification key: instance_normalization).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • axes (iter[int >= 0]) – Normalization axes, excluding batch axis (default: all).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).

Misc layers

class tensorforce.core.layers.Activation(name, nonlinearity, input_spec=None, summary_labels=None)[source]

Activation layer (specification key: activation).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • ('crelu' | 'elu' | 'leaky-relu' | 'none' | 'relu' | 'selu' | 'sigmoid' | (nonlinearity) – ‘softmax’ | ‘softplus’ | ‘softsign’ | ‘swish’ | ‘tanh’): Nonlinearity (required).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.layers.Clipping(name, upper, lower=None, input_spec=None, summary_labels=None)[source]

Clipping layer (specification key: clipping).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • upper (parameter, float) – Upper clipping value (required).
  • lower (parameter, float) – Lower clipping value (default: negative upper value).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.layers.Deltafier(name, concatenate=False, input_spec=None, summary_labels=None)[source]

Deltafier layer computing the difference between the current and the previous input; can only be used as preprocessing layer (specification key: deltafier).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • concatenate (False | int >= 0) – Whether to concatenate instead of replace deltas with input, and if so, concatenation axis (default: false).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.layers.Dropout(name, rate, input_spec=None, summary_labels=None)[source]

Dropout layer (specification key: dropout).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • rate (parameter, 0.0 <= float < 1.0) – Dropout rate (required).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.layers.Image(name, height=None, width=None, grayscale=False, input_spec=None, summary_labels=None)[source]

Image preprocessing layer (specification key: image).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • height (int) – Height of resized image (default: no resizing or relative to width).
  • width (int) – Width of resized image (default: no resizing or relative to height).
  • grayscale (bool | iter[float]) – Turn into grayscale image, optionally using given weights (default: false).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.layers.Sequence(name, length, axis=-1, concatenate=True, input_spec=None, summary_labels=None)[source]

Sequence layer stacking the current and previous inputs; can only be used as preprocessing layer (specification key: sequence).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • length (int > 0) – Number of inputs to concatenate (required).
  • axis (int >= 0) – Concatenation axis, excluding batch axis (default: last axis).
  • concatenate (bool) – Whether to concatenate inputs at given axis, otherwise introduce new sequence axis (default: true).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).

Layers with internal states

class tensorforce.core.layers.InternalGru(name, size, bias=False, activation=None, dropout=0.0, is_trainable=True, input_spec=None, summary_labels=None, l2_regularization=None, **kwargs)[source]

Internal state GRU cell layer (specification key: internal_gru).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • cell ('gru' | 'lstm') – The recurrent cell type (required).
  • size (int >= 0) – Layer output size, 0 implies additionally removing the axis (required).
  • length (parameter, long > 0) – ???+1 (required).
  • bias (bool) – Whether to add a trainable bias variable (default: false).
  • ('crelu' | 'elu' | 'leaky-relu' | 'none' | 'relu' | 'selu' | 'sigmoid' | (activation) – ‘softmax’ | ‘softplus’ | ‘softsign’ | ‘swish’ | ‘tanh’): Activation nonlinearity (default: none).
  • dropout (parameter, 0.0 <= float < 1.0) – Dropout rate (default: 0.0).
  • is_trainable (bool) – Whether layer variables are trainable (default: true).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
  • l2_regularization (float >= 0.0) – Scalar controlling L2 regularization (default: inherit value of parent module).
  • kwargs – Additional arguments for Keras GRU layer, see TensorFlow docs.
class tensorforce.core.layers.InternalLstm(name, size, bias=False, activation=None, dropout=0.0, is_trainable=True, input_spec=None, summary_labels=None, l2_regularization=None, **kwargs)[source]

Internal state LSTM cell layer (specification key: internal_lstm).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • cell ('gru' | 'lstm') – The recurrent cell type (required).
  • size (int >= 0) – Layer output size, 0 implies additionally removing the axis (required).
  • length (parameter, long > 0) – ???+1 (required).
  • bias (bool) – Whether to add a trainable bias variable (default: false).
  • ('crelu' | 'elu' | 'leaky-relu' | 'none' | 'relu' | 'selu' | 'sigmoid' | (activation) – ‘softmax’ | ‘softplus’ | ‘softsign’ | ‘swish’ | ‘tanh’): Activation nonlinearity (default: none).
  • dropout (parameter, 0.0 <= float < 1.0) – Dropout rate (default: 0.0).
  • is_trainable (bool) – Whether layer variables are trainable (default: true).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
  • l2_regularization (float >= 0.0) – Scalar controlling L2 regularization (default: inherit value of parent module).
  • kwargs – Additional arguments for Keras LSTM layer, see TensorFlow docs.
class tensorforce.core.layers.InternalRnn(name, cell, size, length, bias=False, activation=None, dropout=0.0, is_trainable=True, input_spec=None, summary_labels=None, l2_regularization=None, **kwargs)[source]

Internal state RNN cell layer (specification key: internal_rnn).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • cell ('gru' | 'lstm') – The recurrent cell type (required).
  • size (int >= 0) – Layer output size, 0 implies additionally removing the axis (required).
  • length (parameter, long > 0) – ???+1 (required).
  • bias (bool) – Whether to add a trainable bias variable (default: false).
  • ('crelu' | 'elu' | 'leaky-relu' | 'none' | 'relu' | 'selu' | 'sigmoid' | (activation) – ‘softmax’ | ‘softplus’ | ‘softsign’ | ‘swish’ | ‘tanh’): Activation nonlinearity (default: none).
  • dropout (parameter, 0.0 <= float < 1.0) – Dropout rate (default: 0.0).
  • is_trainable (bool) – Whether layer variables are trainable (default: true).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
  • l2_regularization (float >= 0.0) – Scalar controlling L2 regularization (default: inherit value of parent module).
  • kwargs – Additional arguments for Keras RNN cell layer, see TensorFlow docs.

Special layers

class tensorforce.core.layers.Function(name, function, output_spec=None, input_spec=None, summary_labels=None, l2_regularization=None)[source]

Custom TensorFlow function layer (specification key: function).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • function (lambda[x -> x]) – TensorFlow function (required).
  • output_spec (specification) – Output tensor specification containing type and/or shape information (default: same as input).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
  • l2_regularization (float >= 0.0) – Scalar controlling L2 regularization (default: inherit value of parent module).
class tensorforce.core.layers.Keras(name, layer, input_spec=None, summary_labels=None, l2_regularization=None, **kwargs)[source]

Keras layer (specification key: keras).

Parameters:
class tensorforce.core.layers.Register(name, tensor, input_spec=None, summary_labels=None)[source]

Tensor retrieval layer, which is useful when defining more complex network architectures which do not follow the sequential layer-stack pattern, for instance, when handling multiple inputs (specification key: register).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • tensor (string) – Name under which tensor will be registered (required).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.layers.Retrieve(name, tensors, aggregation='concat', axis=0, input_spec=None, summary_labels=None)[source]

Tensor retrieval layer, which is useful when defining more complex network architectures which do not follow the sequential layer-stack pattern, for instance, when handling multiple inputs (specification key: retrieve).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • tensors (iter[string]) – Names of global tensors to retrieve, for instance, state names or previously registered global tensor names (required).
  • aggregation ('concat' | 'product' | 'stack' | 'sum') – Aggregation type in case of multiple tensors (default: ‘concat’).
  • axis (int >= 0) – Aggregation axis, excluding batch axis (default: 0).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.layers.Reuse(name, layer, input_spec=None)[source]

Reuse layer (specification key: reuse).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • layer (string) – Name of a previously defined layer (required).
  • input_spec (specification) – Input tensor specification (internal use).

Memories

Default memory: Replay with default argument capacity

class tensorforce.core.memories.Recent(name, capacity, values_spec, device=None, summary_labels=None)[source]

Batching memory which always retrieves most recent experiences (specification key: recent).

Parameters:
  • name (string) – Memory name (internal use).
  • capacity (int > 0) – Memory capacity, in experience timesteps (required).
  • values_spec (specification) – Values specification (internal use).
  • device (string) – Device name (default: inherit value of parent module).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.memories.Replay(name, capacity, values_spec, device=None, summary_labels=None)[source]

Replay memory which randomly retrieves experiences (specification key: replay).

Parameters:
  • name (string) – Memory name (internal use).
  • capacity (int > 0) – Memory capacity, in experience timesteps (required).
  • values_spec (specification) – Values specification (internal use).
  • device (string) – Device name (default: inherit value of parent module).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).

Networks

Default network: LayeredNetwork with default argument layers

class tensorforce.core.networks.AutoNetwork(name, inputs_spec, size=64, depth=2, final_size=None, final_depth=1, internal_rnn=False, device=None, summary_labels=None, l2_regularization=None)[source]

Network which is automatically configured based on its input tensors, offering high-level customization (specification key: auto).

Parameters:
  • name (string) – Network name (internal use).
  • inputs_spec (specification) – Input tensors specification (internal use).
  • size (int > 0) – Layer size, before concatenation if multiple states (default: 64).
  • depth (int > 0) – Number of layers per state, before concatenation if multiple states (default: 2).
  • final_size (int > 0) – Layer size after concatenation if multiple states (default: layer size).
  • final_depth (int > 0) – Number of layers after concatenation if multiple states (default: 1).
  • internal_rnn (false | parameter, long >= 0) – Whether to add an internal state LSTM cell as last layer, and if so, horizon of the LSTM (default: false).
  • device (string) – Device name (default: inherit value of parent module).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
  • l2_regularization (float >= 0.0) – Scalar controlling L2 regularization (default: inherit value of parent module).
class tensorforce.core.networks.LayeredNetwork(name, layers, inputs_spec, device=None, summary_labels=None, l2_regularization=None)[source]

Network consisting of Tensorforce layers, which can be specified as either a list of layer specifications in the case of a standard sequential layer-stack architecture, or as a list of list of layer specifications in the case of a more complex architecture consisting of multiple sequential layer-stacks (specification key: custom or layered).

Parameters:
  • name (string) – Network name (internal use).
  • layers (iter[specification] | iter[iter[specification]]) – Layers configuration, see layers (required).
  • inputs_spec (specification) – Input tensors specification (internal use).
  • device (string) – Device name (default: inherit value of parent module).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
  • l2_regularization (float >= 0.0) – Scalar controlling L2 regularization (default: inherit value of parent module).

Objectives

class tensorforce.core.objectives.ActionValue(name, huber_loss=0.0, mean_over_actions=False, summary_labels=None)[source]

State-action-value / Q-value objective, which minimizes the L2-distance between the state-action-value estimate and target reward value (specification key: action_value).

Parameters:
  • name (string) – Module name (internal use).
  • huber_loss (parameter, float > 0.0) – Huber loss threshold (default: no huber loss).
  • mean_over_actions (bool) – Whether to compute objective for mean of state-action-values instead of per state-action-value (default: false).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.objectives.Plus(name, objective1, objective2, summary_labels=None)[source]

Additive combination of two objectives (specification key: plus).

Parameters:
  • name (string) – Module name (internal use).
  • objective1 (specification) – First objective configuration (required).
  • objective2 (specification) – Second objective configuration (required).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.objectives.PolicyGradient(name, ratio_based=False, clipping_value=0.0, mean_over_actions=False, summary_labels=None)[source]

Policy gradient objective, which maximizes the log-likelihood or likelihood-ratio scaled by the target reward value (specification key: policy_gradient).

Parameters:
  • name (string) – Module name (internal use).
  • ratio_based (bool) – Whether to scale the likelihood-ratio instead of the log-likelihood (default: false).
  • clipping_value (parameter, float > 0.0) – Clipping threshold for the maximized value (default: no clipping).
  • mean_over_actions (bool) – Whether to compute objective for mean of likelihoods instead of per likelihood (default: false).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.objectives.StateValue(name, huber_loss=0.0, mean_over_actions=False, summary_labels=None)[source]

State-value objective, which minimizes the L2-distance between the state-value estimate and target reward value (specification key: state_value).

Parameters:
  • name (string) – Module name (internal use).
  • huber_loss (parameter, float > 0.0) – Huber loss threshold (default: no huber loss).
  • mean_over_actions (bool) – Whether to compute objective for mean of state-values instead of per state-value (default: false).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).

Optimizers

Default optimizer: MetaOptimizerWrapper

class tensorforce.core.optimizers.ClippingStep(name, optimizer, threshold, mode='global_norm', summary_labels=None)[source]

Clipping-step meta optimizer, which clips the updates of the given optimizer (specification key: clipping_step).

Parameters:
  • name (string) – Module name (internal use).
  • optimizer (specification) – Optimizer configuration (required).
  • threshold (parameter, float > 0.0) – Clipping threshold (required).
  • mode ('global_norm' | 'norm' | 'value') – Clipping mode (default: ‘global_norm’).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.optimizers.Evolutionary(name, learning_rate, num_samples=1, unroll_loop=False, summary_labels=None)[source]

Evolutionary optimizer, which samples random perturbations and applies them either as positive or negative update depending on their improvement of the loss (specification key: evolutionary).

Parameters:
  • name (string) – Module name (internal use).
  • learning_rate (parameter, float > 0.0) – Learning rate (required).
  • num_samples (parameter, int > 0) – Number of sampled perturbations (default: 1).
  • unroll_loop (bool) – Whether to unroll the sampling loop (default: false).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.optimizers.GlobalOptimizer(name, optimizer, summary_labels=None)[source]

Global meta optimizer, which applies the given optimizer to the local variables, then applies the update to a corresponding set of global variables, and subsequently updates the local variables to the value of the global variables; will likely change in the future (specification key: global_optimizer).

Parameters:
  • name (string) – Module name (internal use).
  • optimizer (specification) – Optimizer configuration (required).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.optimizers.MetaOptimizerWrapper(name, optimizer, multi_step=1, subsampling_fraction=1.0, clipping_threshold=None, optimizing_iterations=0, summary_labels=None, **kwargs)[source]

Meta optimizer wrapper (specification key: meta_optimizer_wrapper).

Parameters:
  • name (string) – Module name (internal use).
  • optimizer (specification) – Optimizer configuration (required).
  • multi_step (parameter, int > 0) – Number of optimization steps (default: single step).
  • subsampling_fraction (parameter, 0.0 < float <= 1.0) – Fraction of batch timesteps to subsample (default: no subsampling).
  • clipping_threshold (parameter, float > 0.0) – Clipping threshold (default: no clipping).
  • optimizing_iterations (parameter, int >= 0) – Maximum number of line search iterations (default: no optimizing).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.optimizers.MultiStep(name, optimizer, num_steps, unroll_loop=False, summary_labels=None)[source]

Multi-step meta optimizer, which applies the given optimizer for a number of times (specification key: multi_step).

Parameters:
  • name (string) – Module name (internal use).
  • optimizer (specification) – Optimizer configuration (required).
  • num_steps (parameter, int > 0) – Number of optimization steps (required).
  • unroll_loop (bool) – Whether to unroll the repetition loop (default: false).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.optimizers.NaturalGradient(name, learning_rate, cg_max_iterations=10, cg_damping=0.001, cg_unroll_loop=False, summary_labels=None)[source]

Natural gradient optimizer (specification key: natural_gradient).

Parameters:
  • name (string) – Module name (internal use).
  • learning_rate (parameter, float > 0.0) – Learning rate as KL-divergence of distributions between optimization steps (required).
  • cg_max_iterations (int > 0) – Maximum number of conjugate gradient iterations. (default: 10).
  • cg_damping (float > 0.0) – Conjugate gradient damping factor. (default: 1e-3).
  • cg_unroll_loop (bool) – Whether to unroll the conjugate gradient loop (default: false).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.optimizers.OptimizingStep(name, optimizer, ls_max_iterations=10, ls_accept_ratio=0.9, ls_mode='exponential', ls_parameter=0.5, ls_unroll_loop=False, summary_labels=None)[source]

Optimizing-step meta optimizer, which applies line search to the given optimizer to find a more optimal step size (specification key: optimizing_step).

Parameters:
  • name (string) – Module name (internal use).
  • optimizer (specification) – Optimizer configuration (required).
  • ls_max_iterations (parameter, int > 0) – Maximum number of line search iterations (default: 10).
  • ls_accept_ratio (parameter, float > 0.0) – Line search acceptance ratio (default: 0.9).
  • ls_mode ('exponential' | 'linear') – Line search mode, see line search solver (default: ‘exponential’).
  • ls_parameter (parameter, float > 0.0) – Line search parameter, see line search solver (default: 0.5).
  • ls_unroll_loop (bool) – Whether to unroll the line search loop (default: false).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.optimizers.Plus(name, optimizer1, optimizer2, summary_labels=None)[source]

Additive combination of two optimizers (specification key: plus).

Parameters:
  • name (string) – Module name (internal use).
  • optimizer1 (specification) – First optimizer configuration (required).
  • optimizer2 (specification) – Second optimizer configuration (required).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.optimizers.SubsamplingStep(name, optimizer, fraction, summary_labels=None)[source]

Subsampling-step meta optimizer, which randomly samples a subset of batch instances before applying the given optimizer (specification key: subsampling_step).

Parameters:
  • name (string) – Module name (internal use).
  • optimizer (specification) – Optimizer configuration (required).
  • fraction (parameter, 0.0 < float <= 1.0) – Fraction of batch timesteps to subsample (required).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.optimizers.Synchronization(name, sync_frequency=1, update_weight=1.0, summary_labels=None)[source]

Synchronization optimizer, which updates variables periodically to the value of a corresponding set of source variables (specification key: synchronization).

Parameters:
  • name (string) – Module name (internal use).
  • optimizer (specification) – Optimizer configuration (required).
  • sync_frequency (parameter, int > 0) – Timestep interval between updates which also perform a synchronization step (default: every time).
  • update_weight (parameter, 0.0 < float <= 1.0) – Update weight (default: 1.0).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.optimizers.TFOptimizer(name, optimizer, learning_rate=0.0003, gradient_norm_clipping=1.0, summary_labels=None, **kwargs)[source]

TensorFlow optimizer (specification key: tf_optimizer, adadelta, adagrad, adam, gradient_descent, momentum, proximal_adagrad, proximal_gradient_descent, rmsprop).

Parameters:
  • name (string) – Module name (internal use).
  • optimizer ('adadelta' | 'adagrad' | 'adam' | 'gradient_descent' | 'momentum' | 'proximal_adagrad' | 'proximal_gradient_descent' | 'rmsprop') – TensorFlow optimizer name, see TensorFlow docs (required unless given by specification key).
  • learning_rate (parameter, float > 0.0) – Learning rate (default: 3e-4).
  • gradient_norm_clipping (parameter, float > 0.0) – Clip gradients by the ratio of the sum of their norms (default: 1.0).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
  • kwargs – Arguments for the TensorFlow optimizer, see TensorFlow docs.

Parameters

Default parameter: Constant

class tensorforce.core.parameters.Constant(name, value, dtype, summary_labels=None)[source]

Constant hyperparameter.

Parameters:
  • name (string) – Module name (internal use).
  • value (dtype-dependent) – Constant hyperparameter value (required).
  • dtype ("bool" | "int" | "long" | "float") – Tensor type (required).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.parameters.Decaying(name, dtype, unit, decay, initial_value, decay_steps, increasing=False, inverse=False, scale=1.0, summary_labels=None, **kwargs)[source]

Decaying hyperparameter.

Parameters:
  • name (string) – Module name (internal use).
  • dtype ("bool" | "int" | "long" | "float") – Tensor type (required).
  • unit ("timesteps" | "episodes" | "updates") – Unit of decay schedule (required).
  • decay ("cosine" | "cosine_restarts" | "exponential" | "inverse_time" | "linear_cosine" | "linear_cosine_noisy" | "natural_exponential" | "polynomial") – Decay type, see TensorFlow docs (required).
  • initial_value (float) – Initial value (required).
  • decay_steps (long) – Number of decay steps (required).
  • increasing (bool) – Whether to subtract the decayed value from 1.0 (default: false).
  • inverse (bool) – Whether to take the inverse of the decayed value (default: false).
  • scale (float) – Scaling factor for (inverse) decayed value (default: 1.0).
  • summary_labels ("all" | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
  • kwargs – Additional arguments depend on decay mechanism.
    Cosine decay:
    • alpha (float) – Minimum learning rate value as a fraction of learning_rate (default: 0.0).
    Cosine decay with restarts:
    • t_mul (float) – Used to derive the number of iterations in the i-th period (default: 2.0).
    • m_mul (float) – Used to derive the initial learning rate of the i-th period (default: 1.0).
    • alpha (float) – Minimum learning rate value as a fraction of the learning_rate (default: 0.0).
    Exponential decay:
    • decay_rate (float) – Decay rate (required).
    • staircase (bool) – Whether to apply decay in a discrete staircase, as opposed to continuous, fashion. (default: false).
    Inverse time decay:
    • decay_rate (float) – Decay rate (required).
    • staircase (bool) – Whether to apply decay in a discrete staircase, as opposed to continuous, fashion. (default: false).
    Linear cosine decay:
    • num_periods (float) – Number of periods in the cosine part of the decay (default: 0.5).
    • alpha (float) – Alpha value (default: 0.0).
    • beta (float) – Beta value (default: 0.001).
    Natural exponential decay:
    • decay_rate (float) – Decay rate (required).
    • staircase (bool) – Whether to apply decay in a discrete staircase, as opposed to continuous, fashion. (default: false).
    Noisy linear cosine decay:
    • initial_variance (float) – Initial variance for the noise (default: 1.0).
    • variance_decay (float) – Decay for the noise's variance (default: 0.55).
    • num_periods (float) – Number of periods in the cosine part of the decay (default: 0.5).
    • alpha (float) – Alpha value (default: 0.0).
    • beta (float) – Beta value (default: 0.001).
    Polynomial decay:
    • final_value (float) – Final value (required).
    • power (float) – Power of polynomial (default: 1.0, thus linear).
    • cycle (bool) – Whether to cycle beyond decay_steps (default: false).
class tensorforce.core.parameters.OrnsteinUhlenbeck(name, dtype, theta=0.15, sigma=0.3, mu=0.0, summary_labels=None)[source]

Ornstein-Uhlenbeck process.

Parameters:
  • name (string) – Module name (internal use).
  • dtype ("bool" | "int" | "long" | "float") – Tensor type (required).
  • theta (float > 0.0) – Theta value (default: 0.15).
  • sigma (float > 0.0) – Sigma value (default: 0.3).
  • mu (float) – Mu value (default: 0.0).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.parameters.PiecewiseConstant(name, dtype, unit, boundaries, values, summary_labels=None)[source]

Piecewise-constant hyperparameter.

Parameters:
  • name (string) – Module name (internal use).
  • dtype ("bool" | "int" | "long" | "float") – Tensor type (required).
  • unit ("timesteps" | "episodes" | "updates") – Unit of interval boundaries (required).
  • boundaries (iter[long]) – Strictly increasing interval boundaries for constant segments (required).
  • values (iter[dtype-dependent]) – Interval values of constant segments, one more than (required).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.parameters.Random(name, dtype, distribution, shape=(), summary_labels=None, **kwargs)[source]

Random hyperparameter.

Parameters:
  • name (string) – Module name (internal use).
  • dtype ("bool" | "int" | "long" | "float") – Tensor type (required).
  • distribution ("normal" | "uniform") – Distribution type for random hyperparameter value (required).
  • shape (iter[int > 0]) – Tensor shape (default: scalar).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
  • kwargs – Additional arguments dependent on distribution type.
    Normal distribution:
    • mean (float) – Mean (default: 0.0).
    • stddev (float > 0.0) – Standard deviation (default: 1.0).
    Uniform distribution:
    • minval (int / float) – Lower bound (default: 0 / 0.0).
    • maxval (float > minval) – Upper bound (default: 1.0 for float, required for int).

Preprocessing

class tensorforce.core.layers.Activation(name, nonlinearity, input_spec=None, summary_labels=None)[source]

Activation layer (specification key: activation).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • ('crelu' | 'elu' | 'leaky-relu' | 'none' | 'relu' | 'selu' | 'sigmoid' | (nonlinearity) – ‘softmax’ | ‘softplus’ | ‘softsign’ | ‘swish’ | ‘tanh’): Nonlinearity (required).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.layers.Clipping(name, upper, lower=None, input_spec=None, summary_labels=None)[source]

Clipping layer (specification key: clipping).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • upper (parameter, float) – Upper clipping value (required).
  • lower (parameter, float) – Lower clipping value (default: negative upper value).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.layers.Deltafier(name, concatenate=False, input_spec=None, summary_labels=None)[source]

Deltafier layer computing the difference between the current and the previous input; can only be used as preprocessing layer (specification key: deltafier).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • concatenate (False | int >= 0) – Whether to concatenate instead of replace deltas with input, and if so, concatenation axis (default: false).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.layers.Dropout(name, rate, input_spec=None, summary_labels=None)[source]

Dropout layer (specification key: dropout).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • rate (parameter, 0.0 <= float < 1.0) – Dropout rate (required).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.layers.ExponentialNormalization(name, decay=0.999, axes=None, input_spec=None, summary_labels=None)[source]

Normalization layer based on the exponential moving average (specification key: exponential_normalization).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • decay (parameter, 0.0 <= float <= 1.0) – Decay rate (default: 0.999).
  • axes (iter[int >= 0]) – Normalization axes, excluding batch axis (default: all but last axis).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
  • l2_regularization (float >= 0.0) – Scalar controlling L2 regularization (default: inherit value of parent module).
class tensorforce.core.layers.Image(name, height=None, width=None, grayscale=False, input_spec=None, summary_labels=None)[source]

Image preprocessing layer (specification key: image).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • height (int) – Height of resized image (default: no resizing or relative to width).
  • width (int) – Width of resized image (default: no resizing or relative to height).
  • grayscale (bool | iter[float]) – Turn into grayscale image, optionally using given weights (default: false).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.layers.InstanceNormalization(name, axes=None, input_spec=None, summary_labels=None)[source]

Instance normalization layer (specification key: instance_normalization).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • axes (iter[int >= 0]) – Normalization axes, excluding batch axis (default: all).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.layers.Sequence(name, length, axis=-1, concatenate=True, input_spec=None, summary_labels=None)[source]

Sequence layer stacking the current and previous inputs; can only be used as preprocessing layer (specification key: sequence).

Parameters:
  • name (string) – Layer name (default: internally chosen).
  • length (int > 0) – Number of inputs to concatenate (required).
  • axis (int >= 0) – Concatenation axis, excluding batch axis (default: last axis).
  • concatenate (bool) – Whether to concatenate inputs at given axis, otherwise introduce new sequence axis (default: true).
  • input_spec (specification) – Input tensor specification (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).

Policies

Default policy: ParametrizedDistributions

class tensorforce.core.policies.ParametrizedDistributions(name, states_spec, actions_spec, network='auto', distributions=None, device=None, summary_labels=None, l2_regularization=None)[source]

Policy which parametrizes independent distributions per action conditioned on the output of a central states-processing neural network (supports both stochastic and action-value-based policy interface) (specification key: parametrized_distributions).

Parameters:
  • name (string) – Module name (internal use).
  • states_spec (specification) – States specification (internal use).
  • actions_spec (specification) – Actions specification (internal use).
  • network ('auto' | specification) – Policy network configuration, see networks (default: ‘auto’, automatically configured network).
  • distributions (dict[specification]) – Distributions configuration, see distributions, specified per action-type or -name (default: per action-type, Bernoulli distribution for binary boolean actions, categorical distribution for discrete integer actions, Gaussian distribution for unbounded continuous actions, Beta distribution for bounded continuous actions).
  • device (string) – Device name (default: inherit value of parent module).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
  • l2_regularization (float >= 0.0) – Scalar controlling L2 regularization (default: inherit value of parent module).

Environment interface

class tensorforce.environments.Environment[source]

Tensorforce environment interface.

actions()[source]

Returns the action space specification.

Returns:Arbitrarily nested dictionary of action descriptions with the following attributes:
  • type ("bool" | "int" | "float") – action data type (required).
  • shape (int > 0 | iter[int > 0]) – action shape (default: scalar).
  • num_actions (int > 0) – number of discrete action values (required for type "int").
  • min_value/max_value (float) – minimum/maximum action value (optional for type "float").
Return type:specification
close()[source]

Closes the environment.

static create(environment, **kwargs)[source]

Creates an environment from a specification.

Parameters:
  • environment (specification) – JSON file, specification key, configuration dictionary, library module, or Environment subclass (required).
  • kwargs – Additional arguments.
execute(actions)[source]

Executes the given action(s) and advances the environment by one step.

Parameters:actions (dict[action]) – Dictionary containing action(s) to be executed (required).
Returns:Dictionary containing next state(s), whether a terminal state is reached or 2 if the episode was aborted, and observed reward.
Return type:((dict[state], bool | 0 | 1 | 2, float))
max_episode_timesteps()[source]

Returns the maximum number of timesteps per episode.

Returns:Maximum number of timesteps per episode.
Return type:int
reset()[source]

Resets the environment to start a new episode.

Returns:Dictionary containing initial state(s) and auxiliary information.
Return type:dict[state]
states()[source]

Returns the state space specification.

Returns:Arbitrarily nested dictionary of state descriptions with the following attributes:
  • type ("bool" | "int" | "float") – state data type (default: "float").
  • shape (int | iter[int]) – state shape (required).
  • num_states (int > 0) – number of discrete state values (required for type "int").
  • min_value/max_value (float) – minimum/maximum state value (optional for type "float").
Return type:specification

Arcade Learning Environment

class tensorforce.environments.ArcadeLearningEnvironment(level, life_loss_terminal=False, life_loss_punishment=0.0, repeat_action_probability=0.0, visualize=False, frame_skip=1, seed=None)[source]

Arcade Learning Environment adapter (specification key: ale, arcade_learning_environment).

May require:

sudo apt-get install libsdl1.2-dev libsdl-gfx1.2-dev libsdl-image1.2-dev cmake

mkdir build && cd build
cmake -DUSE_SDL=ON -DUSE_RLGLUE=OFF -DBUILD_EXAMPLES=ON ..
make -j 4

pip install git+https://github.com/mgbellemare/Arcade-Learning-Environment.git
Parameters:
  • level (string) – ALE rom file (required).
  • loss_of_life_termination – Signals a terminal state on loss of life (default: false).
  • loss_of_life_reward (float) – Reward/Penalty on loss of life (negative values are a penalty) (default: 0.0).
  • repeat_action_probability (float) – Repeats last action with given probability (default: 0.0).
  • visualize (bool) – Whether to visualize interaction (default: false).
  • frame_skip (int > 0) – Number of times to repeat an action without observing (default: 1).
  • seed (int) – Random seed (default: none).

Maze Explorer

class tensorforce.environments.MazeExplorer(level, visualize=False)[source]

MazeExplorer environment adapter (specification key: mazeexp, maze_explorer).

May require:

sudo apt-get install freeglut3-dev

pip install mazeexp
Parameters:
  • level (int) – Game mode, see GitHub (required).
  • visualize (bool) – Whether to visualize interaction (default: false).

Open Sim

class tensorforce.environments.OpenSim(level, visualize=False, integrator_accuracy=5e-05)[source]

OpenSim environment adapter (specification key: osim, open_sim).

Parameters:
  • level ('Arm2D' | 'L2Run' | 'Prosthetics') – Environment id (required).
  • visualize (bool) – Whether to visualize interaction (default: false).
  • integrator_accuracy (float) – Integrator accuracy (default: 5e-5).

OpenAI Gym

class tensorforce.environments.OpenAIGym(level, visualize=False, max_episode_timesteps=None, terminal_reward=0.0, reward_threshold=None, tags=None, monitor_directory=None, **kwargs)[source]

OpenAI Gym environment adapter (specification key: gym, openai_gym).

May require:

pip install gym[all]
Parameters:
  • level (string) – Gym id (required).
  • visualize (bool) – Whether to visualize interaction (default: false).
  • max_episode_timesteps (false | int > 0) – Whether to terminate an episode after a while, and if so, maximum number of timesteps per episode (default: Gym default).
  • terminal_reward (float) – Additional reward for early termination, if otherwise indistinguishable from termination due to maximum number of timesteps (default: Gym default).
  • reward_threshold (float) – Gym environment argument, the reward threshold before the task is considered solved (default: Gym default).
  • tags (dict) – Gym environment argument, a set of arbitrary key-value tags on this environment, including simple property=True tags (default: Gym default).
  • monitor_directory (string) – Monitor output directory (default: none).
  • kwargs – Additional Gym environment arguments.

OpenAI Retro

class tensorforce.environments.OpenAIRetro(level, visualize=False, monitor_directory=None, **kwargs)[source]

OpenAI Retro environment adapter (specification key: retro, openai_retro).

May require:

pip install gym-retro
Parameters:
  • level (string) – Game id (required).
  • visualize (bool) – Whether to visualize interaction (default: false).
  • monitor_directory (string) – Monitor output directory (default: none).
  • kwargs – Additional Retro environment arguments.

PyGame Learning Environment

class tensorforce.environments.PyGameLearningEnvironment(level, visualize=False, frame_skip=1, fps=30)[source]

PyGame Learning Environment environment adapter (specification key: ple, pygame_learning_environment).

May require:

sudo apt-get install git python3-dev python3-setuptools python3-numpy python3-opengl \
libsdl-image1.2-dev libsdl-mixer1.2-dev libsdl-ttf2.0-dev libsmpeg-dev  libsdl1.2-dev \
libportmidi-dev libswscale-dev libavformat-dev libavcodec-dev libtiff5-dev libx11-6 \
libx11-dev fluid-soundfont-gm timgm6mb-soundfont xfonts-base xfonts-100dpi xfonts-75dpi \
xfonts-cyrillic fontconfig fonts-freefont-ttf libfreetype6-dev

pip install git+https://github.com/pygame/pygame.git

pip install git+https://github.com/ntasfi/PyGame-Learning-Environment.git
Parameters:
  • level (string | subclass of ple.games.base) – Game instance or name of class in ple.games, like ‘doom’, ‘flappybird’, ‘monsterkong’, ‘catcher’, ‘pixelcopter’, ‘pong’, ‘puckworld’, ‘raycastmaze’, ‘snake’, ‘waterworld’ (required).
  • visualize (bool) – Whether to visualize interaction (default: false).
  • frame_skip (int > 0) – Number of times to repeat an action without observing (default: 1).
  • fps (int > 0) – The desired frames per second we want to run our game at (default: 30).

ViZDoom

class tensorforce.environments.ViZDoom(level, visualize=False, include_variables=False, factored_action=False, frame_skip=12, seed=None)[source]

ViZDoom environment adapter (specification key: vizdoom).

Parameters:
  • level (string) – ViZDoom configuration file (required).
  • include_variables (bool) – Whether to include game variables to state (default: false).
  • factored_action (bool) – Whether to use factored action representation (default: false).
  • visualize (bool) – Whether to visualize interaction (default: false).
  • frame_skip (int > 0) – Number of times to repeat an action without observing (default: 12).
  • seed (int) – Random seed (default: none).