Agent interface¶

Initialization and termination¶

static TensorforceAgent.create(agent='tensorforce', environment=None, **kwargs)¶

Creates an agent from a specification.

Parameters:

agent (specification | Agent class/object) – JSON file, specification key, configuration dictionary, library module, or Agent class/object (default: Policy agent).
environment (Environment object) – Environment which the agent is supposed to be trained on, environment-related arguments like state/action space specifications and maximum episode length will be extract if given (recommended).
kwargs – Additional arguments.

TensorforceAgent.close()¶: Closes the agent.

Main reinforcement learning interface¶

TensorforceAgent.act(states, internals=None, parallel=0, independent=False, deterministic=False, evaluation=False, query=None, **kwargs)¶

Returns action(s) for the given state(s), needs to be followed by observe(...) unless independent mode set via independent/evaluation.

Parameters:

states (dict[state] | iter[dict[state]]) – Dictionary containing state(s) to be acted on (required).
internals (dict[internal] | iter[dict[internal]]) – Dictionary containing current internal agent state(s), either given by initial_internals() at the beginning of an episode or as return value of the preceding act(...) call (required if independent mode and agent has internal states).
parallel (int | iter[int]) – Parallel execution index (default: 0).
independent (bool) – Whether act is not part of the main agent-environment interaction, and this call is thus not followed by observe (default: false).
deterministic (bool) – Ff independent mode, whether to act deterministically, so no exploration and sampling (default: false).
evaluation (bool) – Whether the agent is currently evaluated, implies independent and deterministic (default: false).
query (list[str]) – Names of tensors to retrieve (default: none).
kwargs – Additional input values, for instance, for dynamic hyperparameters.

Returns:

dict[action] | iter[dict[action]], dict[internal] | iter[dict[internal]] if internals argument given, plus optional list[str]: Dictionary containing action(s), dictionary containing next internal agent state(s) if independent mode, plus queried tensor values if requested.

TensorforceAgent.observe(reward, terminal=False, parallel=0, query=None, **kwargs)¶

Observes reward and whether a terminal state is reached, needs to be preceded by act(...).

Parameters:	reward (float \| iter[float]) – Reward (required). terminal (bool \| 0 \| 1 \| 2 \| iter[..]) – Whether a terminal state is reached or 2 if the episode was aborted (default: false). parallel (int, iter[int]) – Parallel execution index (default: 0). query (list[str]) – Names of tensors to retrieve (default: none). kwargs – Additional input values, for instance, for dynamic hyperparameters.
Returns:	Whether an update was performed, plus queried tensor values if requested.
Return type:	(bool \| int, optional list[str])

Required for evaluation at episode start¶

TensorforceAgent.initial_internals()¶

Returns the initial internal agent state(s), to be used at the beginning of an episode as internals argument for act(...) in independent mode

Returns:	Dictionary containing initial internal agent state(s).
Return type:	dict[internal]

Loading and saving¶

static TensorforceAgent.load(directory=None, filename=None, format=None, environment=None, **kwargs)¶

Restores an agent from a specification directory/file.

Parameters:

directory (str) – Checkpoint directory (default: current directory “.”).
filename (str) – Checkpoint filename, with or without append and extension (default: “agent”).
format ("tensorflow" | "numpy" | "hdf5" | "pb-actonly") – File format, “pb-actonly” loads an act-only agent based on a Protobuf model (default: format matching directory and filename, required to be unambiguous).
environment (Environment object) – Environment which the agent is supposed to be trained on, environment-related arguments like state/action space specifications and maximum episode length will be extract if given (recommended unless “pb-actonly” format).
kwargs – Additional arguments, invalid for “pb-actonly” format.

TensorforceAgent.save(directory=None, filename=None, format='tensorflow', append=None)¶

Saves the agent to a checkpoint.

Parameters:	directory (str) – Checkpoint directory (default: directory specified for TensorFlow saver, otherwise current directory). filename (str) – Checkpoint filename, without extension (default: filename specified for TensorFlow saver, otherwise name of agent). format ("tensorflow" \| "numpy" \| "hdf5") – File format, “tensorflow” uses TensorFlow saver to store variables, graph meta information and an optimized Protobuf model with an act-only graph, whereas the others only store variables as NumPy/HDF5 file (default: TensorFlow format). append ("timesteps" \| "episodes" \| "updates") – Append current timestep/episode/update to checkpoint filename (default: none).
Returns:	Checkpoint path.
Return type:	str

Get and assign variables¶

TensorforceAgent.get_variables()¶

Returns the names of all agent variables.

Returns:	Names of variables.
Return type:	list[str]

TensorforceAgent.get_variable(variable)¶

Returns the value of the variable with the given name.

Parameters:	variable (string) – Variable name (required).
Returns:	Variable value.
Return type:	numpy-array

TensorforceAgent.assign_variable(variable, value)¶

Assigns the given value to the variable with the given name.

Parameters:	variable (string) – Variable name (required). value (variable-compatible value) – Value to assign to variable (required).

Custom summaries¶

TensorforceAgent.summarize(summary, value, step=None)¶

Records a value for the given custom summary label (as specified via summarizer[custom]).

Parameters:	variable (string) – Custom summary label (required). value (summary-compatible value) – Summary value to record (required). step (int) – Summary recording step (default: current timestep).

Advanced functions for specialized use cases¶

TensorforceAgent.experience(states, actions, terminal, reward, internals=None, query=None, **kwargs)[source]¶

Feed experience traces.

Parameters:

states (dict[array[state]]) – Dictionary containing arrays of states (required).
actions (dict[array[action]]) – Dictionary containing arrays of actions (required).
terminal (array[bool]) – Array of terminals (required).
reward (array[float]) – Array of rewards (required).
internals (dict[state]) – Dictionary containing arrays of internal agent states (default: no internal states).
query (list[str]) – Names of tensors to retrieve (default: none).
kwargs – Additional input values, for instance, for dynamic hyperparameters.

TensorforceAgent.update(query=None, **kwargs)[source]¶

Perform an update.

Parameters:	query (list[str]) – Names of tensors to retrieve (default: none). kwargs – Additional input values, for instance, for dynamic hyperparameters.

TensorforceAgent.pretrain(directory, num_iterations, num_traces=1, num_updates=1)[source]¶

Naive pretraining approach as a combination of experience() and update, uses experience traces obtained e.g. via recorder argument.

Parameters:

directory (path) – Directory with experience traces, e.g. obtained via recorder; episode length has to be consistent with agent configuration (required).
num_iterations (int > 0) – Number of iterations consisting of loading new traces and performing multiple updates (required).
num_traces (int > 0) – Number of traces to load per iteration; has to at least satisfy the update batch size (default: 1).
num_updates (int > 0) – Number of updates per iteration (default: 1).

Others¶

TensorforceAgent.reset()¶: Resets all agent buffers and discards unfinished episodes.

TensorforceAgent.get_output_tensors(function)¶

Returns the names of output tensors for the given function.

Parameters:	function (str) – Function name (required).
Returns:	Names of output tensors.
Return type:	list[str]

TensorforceAgent.get_available_summaries()¶

Returns the summary labels provided by the agent.

Returns:	Available summary labels.
Return type:	list[str]