Agent interface

Initialization and termination

static TensorforceAgent.create(agent='tensorforce', environment=None, **kwargs)

Creates an agent from a specification.

Parameters:
  • agent (specification | Agent class/object) – JSON file, specification key, configuration dictionary, library module, or Agent class/object (default: Policy agent).
  • environment (Environment object) – Environment which the agent is supposed to be trained on, environment-related arguments like state/action space specifications and maximum episode length will be extract if given (recommended).
  • kwargs – Additional arguments.
TensorforceAgent.close()

Closes the agent.

Main reinforcement learning interface

TensorforceAgent.act(states, internals=None, parallel=0, independent=False, deterministic=False, evaluation=False, query=None, **kwargs)

Returns action(s) for the given state(s), needs to be followed by observe(...) unless independent mode set via independent/evaluation.

Parameters:
  • states (dict[state] | iter[dict[state]]) – Dictionary containing state(s) to be acted on (required).
  • internals (dict[internal] | iter[dict[internal]]) – Dictionary containing current internal agent state(s), either given by initial_internals() at the beginning of an episode or as return value of the preceding act(...) call (required if independent mode and agent has internal states).
  • parallel (int | iter[int]) – Parallel execution index (default: 0).
  • independent (bool) – Whether act is not part of the main agent-environment interaction, and this call is thus not followed by observe (default: false).
  • deterministic (bool) – Ff independent mode, whether to act deterministically, so no exploration and sampling (default: false).
  • evaluation (bool) – Whether the agent is currently evaluated, implies independent and deterministic (default: false).
  • query (list[str]) – Names of tensors to retrieve (default: none).
  • kwargs – Additional input values, for instance, for dynamic hyperparameters.
Returns:

dict[action] | iter[dict[action]], dict[internal] | iter[dict[internal]] if internals argument given, plus optional list[str]: Dictionary containing action(s), dictionary containing next internal agent state(s) if independent mode, plus queried tensor values if requested.

TensorforceAgent.observe(reward, terminal=False, parallel=0, query=None, **kwargs)

Observes reward and whether a terminal state is reached, needs to be preceded by act(...).

Parameters:
  • reward (float | iter[float]) – Reward (required).
  • terminal (bool | 0 | 1 | 2 | iter[..]) – Whether a terminal state is reached or 2 if the episode was aborted (default: false).
  • parallel (int, iter[int]) – Parallel execution index (default: 0).
  • query (list[str]) – Names of tensors to retrieve (default: none).
  • kwargs – Additional input values, for instance, for dynamic hyperparameters.
Returns:

Whether an update was performed, plus queried tensor values if requested.

Return type:

(bool | int, optional list[str])

Required for evaluation at episode start

TensorforceAgent.initial_internals()

Returns the initial internal agent state(s), to be used at the beginning of an episode as internals argument for act(...) in independent mode

Returns:Dictionary containing initial internal agent state(s).
Return type:dict[internal]

Loading and saving

static TensorforceAgent.load(directory=None, filename=None, format=None, environment=None, **kwargs)

Restores an agent from a specification directory/file.

Parameters:
  • directory (str) – Checkpoint directory (default: current directory “.”).
  • filename (str) – Checkpoint filename, with or without append and extension (default: “agent”).
  • format ("tensorflow" | "numpy" | "hdf5" | "pb-actonly") – File format, “pb-actonly” loads an act-only agent based on a Protobuf model (default: format matching directory and filename, required to be unambiguous).
  • environment (Environment object) – Environment which the agent is supposed to be trained on, environment-related arguments like state/action space specifications and maximum episode length will be extract if given (recommended unless “pb-actonly” format).
  • kwargs – Additional arguments, invalid for “pb-actonly” format.
TensorforceAgent.save(directory=None, filename=None, format='tensorflow', append=None)

Saves the agent to a checkpoint.

Parameters:
  • directory (str) – Checkpoint directory (default: directory specified for TensorFlow saver, otherwise current directory).
  • filename (str) – Checkpoint filename, without extension (default: filename specified for TensorFlow saver, otherwise name of agent).
  • format ("tensorflow" | "numpy" | "hdf5") – File format, “tensorflow” uses TensorFlow saver to store variables, graph meta information and an optimized Protobuf model with an act-only graph, whereas the others only store variables as NumPy/HDF5 file (default: TensorFlow format).
  • append ("timesteps" | "episodes" | "updates") – Append current timestep/episode/update to checkpoint filename (default: none).
Returns:

Checkpoint path.

Return type:

str

Get and assign variables

TensorforceAgent.get_variables()

Returns the names of all agent variables.

Returns:Names of variables.
Return type:list[str]
TensorforceAgent.get_variable(variable)

Returns the value of the variable with the given name.

Parameters:variable (string) – Variable name (required).
Returns:Variable value.
Return type:numpy-array
TensorforceAgent.assign_variable(variable, value)

Assigns the given value to the variable with the given name.

Parameters:
  • variable (string) – Variable name (required).
  • value (variable-compatible value) – Value to assign to variable (required).

Custom summaries

TensorforceAgent.summarize(summary, value, step=None)

Records a value for the given custom summary label (as specified via summarizer[custom]).

Parameters:
  • variable (string) – Custom summary label (required).
  • value (summary-compatible value) – Summary value to record (required).
  • step (int) – Summary recording step (default: current timestep).

Advanced functions for specialized use cases

TensorforceAgent.experience(states, actions, terminal, reward, internals=None, query=None, **kwargs)[source]

Feed experience traces.

Parameters:
  • states (dict[array[state]]) – Dictionary containing arrays of states (required).
  • actions (dict[array[action]]) – Dictionary containing arrays of actions (required).
  • terminal (array[bool]) – Array of terminals (required).
  • reward (array[float]) – Array of rewards (required).
  • internals (dict[state]) – Dictionary containing arrays of internal agent states (default: no internal states).
  • query (list[str]) – Names of tensors to retrieve (default: none).
  • kwargs – Additional input values, for instance, for dynamic hyperparameters.
TensorforceAgent.update(query=None, **kwargs)[source]

Perform an update.

Parameters:
  • query (list[str]) – Names of tensors to retrieve (default: none).
  • kwargs – Additional input values, for instance, for dynamic hyperparameters.
TensorforceAgent.pretrain(directory, num_iterations, num_traces=1, num_updates=1)[source]

Naive pretraining approach as a combination of experience() and update, uses experience traces obtained e.g. via recorder argument.

Parameters:
  • directory (path) – Directory with experience traces, e.g. obtained via recorder; episode length has to be consistent with agent configuration (required).
  • num_iterations (int > 0) – Number of iterations consisting of loading new traces and performing multiple updates (required).
  • num_traces (int > 0) – Number of traces to load per iteration; has to at least satisfy the update batch size (default: 1).
  • num_updates (int > 0) – Number of updates per iteration (default: 1).

Others

TensorforceAgent.reset()

Resets all agent buffers and discards unfinished episodes.

TensorforceAgent.get_output_tensors(function)

Returns the names of output tensors for the given function.

Parameters:function (str) – Function name (required).
Returns:Names of output tensors.
Return type:list[str]
TensorforceAgent.get_available_summaries()

Returns the summary labels provided by the agent.

Returns:Available summary labels.
Return type:list[str]