Agent interface¶
Initialization and termination¶
-
static
TensorforceAgent.
create
(agent='tensorforce', environment=None, **kwargs)¶ Creates an agent from a specification.
Parameters: - agent (specification | Agent class/object) – JSON file, specification key, configuration
dictionary, library module, or
Agent
class/object (default: Policy agent). - environment (Environment object) – Environment which the agent is supposed to be trained on, environment-related arguments like state/action space specifications and maximum episode length will be extract if given (recommended).
- kwargs – Additional arguments.
- agent (specification | Agent class/object) – JSON file, specification key, configuration
dictionary, library module, or
-
TensorforceAgent.
close
()¶ Closes the agent.
Main reinforcement learning interface¶
-
TensorforceAgent.
act
(states, internals=None, parallel=0, independent=False, deterministic=False, evaluation=False, query=None, **kwargs)¶ Returns action(s) for the given state(s), needs to be followed by
observe(...)
unless independent mode set viaindependent
/evaluation
.Parameters: - states (dict[state] | iter[dict[state]]) – Dictionary containing state(s) to be acted on (required).
- internals (dict[internal] | iter[dict[internal]]) – Dictionary containing current
internal agent state(s), either given by
initial_internals()
at the beginning of an episode or as return value of the precedingact(...)
call (required if independent mode and agent has internal states). - parallel (int | iter[int]) – Parallel execution index (default: 0).
- independent (bool) – Whether act is not part of the main agent-environment interaction, and this call is thus not followed by observe (default: false).
- deterministic (bool) – Ff independent mode, whether to act deterministically, so no exploration and sampling (default: false).
- evaluation (bool) – Whether the agent is currently evaluated, implies independent and deterministic (default: false).
- query (list[str]) – Names of tensors to retrieve (default: none).
- kwargs – Additional input values, for instance, for dynamic hyperparameters.
Returns: dict[action] | iter[dict[action]], dict[internal] | iter[dict[internal]] if
internals
argument given, plus optional list[str]: Dictionary containing action(s), dictionary containing next internal agent state(s) if independent mode, plus queried tensor values if requested.
-
TensorforceAgent.
observe
(reward, terminal=False, parallel=0, query=None, **kwargs)¶ Observes reward and whether a terminal state is reached, needs to be preceded by
act(...)
.Parameters: - reward (float | iter[float]) – Reward (required).
- terminal (bool | 0 | 1 | 2 | iter[..]) – Whether a terminal state is reached or 2 if the episode was aborted (default: false).
- parallel (int, iter[int]) – Parallel execution index (default: 0).
- query (list[str]) – Names of tensors to retrieve (default: none).
- kwargs – Additional input values, for instance, for dynamic hyperparameters.
Returns: Whether an update was performed, plus queried tensor values if requested.
Return type: (bool | int, optional list[str])
Required for evaluation at episode start¶
-
TensorforceAgent.
initial_internals
()¶ Returns the initial internal agent state(s), to be used at the beginning of an episode as
internals
argument foract(...)
in independent modeReturns: Dictionary containing initial internal agent state(s). Return type: dict[internal]
Loading and saving¶
-
static
TensorforceAgent.
load
(directory=None, filename=None, format=None, environment=None, **kwargs)¶ Restores an agent from a specification directory/file.
Parameters: - directory (str) – Checkpoint directory (default: current directory “.”).
- filename (str) – Checkpoint filename, with or without append and extension (default: “agent”).
- format ("tensorflow" | "numpy" | "hdf5" | "pb-actonly") – File format, “pb-actonly” loads an act-only agent based on a Protobuf model (default: format matching directory and filename, required to be unambiguous).
- environment (Environment object) – Environment which the agent is supposed to be trained on, environment-related arguments like state/action space specifications and maximum episode length will be extract if given (recommended unless “pb-actonly” format).
- kwargs – Additional arguments, invalid for “pb-actonly” format.
-
TensorforceAgent.
save
(directory=None, filename=None, format='tensorflow', append=None)¶ Saves the agent to a checkpoint.
Parameters: - directory (str) – Checkpoint directory (default: directory specified for TensorFlow saver, otherwise current directory).
- filename (str) – Checkpoint filename, without extension (default: filename specified for TensorFlow saver, otherwise name of agent).
- format ("tensorflow" | "numpy" | "hdf5") – File format, “tensorflow” uses TensorFlow saver to store variables, graph meta information and an optimized Protobuf model with an act-only graph, whereas the others only store variables as NumPy/HDF5 file (default: TensorFlow format).
- append ("timesteps" | "episodes" | "updates") – Append current timestep/episode/update to checkpoint filename (default: none).
Returns: Checkpoint path.
Return type: str
Get and assign variables¶
-
TensorforceAgent.
get_variables
()¶ Returns the names of all agent variables.
Returns: Names of variables. Return type: list[str]
-
TensorforceAgent.
get_variable
(variable)¶ Returns the value of the variable with the given name.
Parameters: variable (string) – Variable name (required). Returns: Variable value. Return type: numpy-array
-
TensorforceAgent.
assign_variable
(variable, value)¶ Assigns the given value to the variable with the given name.
Parameters: - variable (string) – Variable name (required).
- value (variable-compatible value) – Value to assign to variable (required).
Custom summaries¶
-
TensorforceAgent.
summarize
(summary, value, step=None)¶ Records a value for the given custom summary label (as specified via summarizer[custom]).
Parameters: - variable (string) – Custom summary label (required).
- value (summary-compatible value) – Summary value to record (required).
- step (int) – Summary recording step (default: current timestep).
Advanced functions for specialized use cases¶
-
TensorforceAgent.
experience
(states, actions, terminal, reward, internals=None, query=None, **kwargs)[source]¶ Feed experience traces.
Parameters: - states (dict[array[state]]) – Dictionary containing arrays of states (required).
- actions (dict[array[action]]) – Dictionary containing arrays of actions (required).
- terminal (array[bool]) – Array of terminals (required).
- reward (array[float]) – Array of rewards (required).
- internals (dict[state]) – Dictionary containing arrays of internal agent states (default: no internal states).
- query (list[str]) – Names of tensors to retrieve (default: none).
- kwargs – Additional input values, for instance, for dynamic hyperparameters.
-
TensorforceAgent.
update
(query=None, **kwargs)[source]¶ Perform an update.
Parameters: - query (list[str]) – Names of tensors to retrieve (default: none).
- kwargs – Additional input values, for instance, for dynamic hyperparameters.
-
TensorforceAgent.
pretrain
(directory, num_iterations, num_traces=1, num_updates=1)[source]¶ Naive pretraining approach as a combination of
experience()
andupdate
, uses experience traces obtained e.g. via recorder argument.Parameters: - directory (path) – Directory with experience traces, e.g. obtained via recorder; episode length has to be consistent with agent configuration (required).
- num_iterations (int > 0) – Number of iterations consisting of loading new traces and performing multiple updates (required).
- num_traces (int > 0) – Number of traces to load per iteration; has to at least satisfy the update batch size (default: 1).
- num_updates (int > 0) – Number of updates per iteration (default: 1).
Others¶
-
TensorforceAgent.
reset
()¶ Resets all agent buffers and discards unfinished episodes.
-
TensorforceAgent.
get_output_tensors
(function)¶ Returns the names of output tensors for the given function.
Parameters: function (str) – Function name (required). Returns: Names of output tensors. Return type: list[str]
-
TensorforceAgent.
get_available_summaries
()¶ Returns the summary labels provided by the agent.
Returns: Available summary labels. Return type: list[str]