General agent interface

Initialization and termination

static TensorforceAgent.create(agent='tensorforce', environment=None, **kwargs)

Create an agent from a specification.

Parameters:
  • agent (specification | Agent class/object | callable[states -> actions]) – JSON file, specification key, configuration dictionary, library module, or Agent class/object. Alternatively, an act-function mapping states to actions which is supposed to be recorded. (default: Tensorforce base agent).
  • environment (Environment object) – Environment which the agent is supposed to be trained on, environment-related arguments like state/action space specifications and maximum episode length will be extract if given (recommended).
  • kwargs – Additional agent arguments.
TensorforceAgent.reset()

Resets possibly inconsistent internal values, for instance, after saving and restoring an agent. Automatically triggered as part of Agent.create/load/initialize/restore.

TensorforceAgent.close()

Closes the agent.

Reinforcement learning interface

TensorforceAgent.act(states, internals=None, parallel=0, independent=False, deterministic=True, evaluation=None)

Returns action(s) for the given state(s), needs to be followed by observe() unless independent mode.

See the act-observe script for an example application as part of the act-observe interface.

Parameters:
  • states (dict[state] | iter[dict[state]]) – Dictionary containing state(s) to be acted on (required).
  • internals (dict[internal] | iter[dict[internal]]) – Dictionary containing current internal agent state(s), either given by initial_internals() at the beginning of an episode or as return value of the preceding act() call (required if independent mode and agent has internal states).
  • parallel (int | iter[int]) – Parallel execution index (default: 0).
  • independent (bool) – Whether this act() call is not part of the training agent-environment interaction and thus not followed by observe(), meaning its inputs/outputs/internals are not stored in memory and not used in updates, e.g. for independent evaluation episodes which should not be learned from (default: false).
  • deterministic (bool) – Whether action should be chosen deterministically, so no action distribution sampling and no exploration, only valid in independent mode (default: true).
Returns:

dict[action] | iter[dict[action]], dict[internal] | iter[dict[internal]] if internals argument given: Dictionary containing action(s), dictionary containing next internal agent state(s) if independent mode.

TensorforceAgent.observe(reward=0.0, terminal=False, parallel=0)

Observes reward and whether a terminal state is reached, needs to be preceded by act().

See the act-observe script for an example application as part of the act-observe interface.

Parameters:
  • reward (float | iter[float]) – Reward (default: 0.0).
  • terminal (bool | 0 | 1 | 2 | iter[..]) – Whether a terminal state is reached, or 2 if the episode was aborted (default: false).
  • parallel (int, iter[int]) – Parallel execution index (default: 0).
Returns:

Number of performed updates.

Return type:

int

Get initial internals (for independent-act)

TensorforceAgent.initial_internals()

Returns the initial internal agent state(s), to be used at the beginning of an episode as internals argument for act() in independent mode

Returns:Dictionary containing initial internal agent state(s).
Return type:dict[internal]

Experience - update interface

TensorforceAgent.experience(states, actions, terminal, reward, internals=None)

Feed experience traces.

See the act-experience-update script for an example application as part of the act-experience-update interface, which is an alternative to the act-observe interaction pattern.

Parameters:
  • states (dict[array[state]]) – Dictionary containing arrays of states (required).
  • actions (dict[array[action]]) – Dictionary containing arrays of actions (required).
  • terminal (array[bool]) – Array of terminals (required).
  • reward (array[float]) – Array of rewards (required).
  • internals (dict[state]) – Dictionary containing arrays of internal agent states (required if agent has internal states).
TensorforceAgent.update(query=None, **kwargs)

Perform an update.

See the act-experience-update script for an example application as part of the act-experience-update interface, which is an alternative to the act-observe interaction pattern.

Pretraining

TensorforceAgent.pretrain(directory, num_iterations, num_traces=1, num_updates=1, extension='.npz')

Simple pretraining approach as a combination of experience() and update, akin to behavioral cloning, using experience traces obtained e.g. via recording agent interactions (see documentation).

For the given number of iterations, load the given number of trace files (which each contain recorder[frequency] episodes), feed the experience to the agent’s internal memory, and subsequently trigger the given number of updates (which will use the experience in the internal memory, fed in this or potentially previous iterations).

See the record-and-pretrain script for an example application.

Parameters:
  • directory (path) – Directory with experience traces, e.g. obtained via recorder; episode length has to be consistent with agent configuration (required).
  • num_iterations (int > 0) – Number of iterations consisting of loading new traces and performing multiple updates (required).
  • num_traces (int > 0) – Number of traces to load per iteration; has to at least satisfy the update batch size (default: 1).
  • num_updates (int > 0) – Number of updates per iteration (default: 1).
  • extension (str) – Traces file extension to filter the given directory for (default: “.npz”).

Loading and saving

static TensorforceAgent.load(directory=None, filename=None, format=None, environment=None, **kwargs)

Restores an agent from a directory/file.

Parameters:
  • directory (str) – Checkpoint directory (required, unless saver is specified).
  • filename (str) – Checkpoint filename, with or without append and extension (default: “agent”).
  • format ("checkpoint" | "numpy" | "hdf5") – File format (default: format matching directory and filename, required to be unambiguous).
  • environment (Environment object) – Environment which the agent is supposed to be trained on, environment-related arguments like state/action space specifications and maximum episode length will be extract if given (recommended).
  • kwargs – Additional agent arguments.
TensorforceAgent.save(directory, filename=None, format='checkpoint', append=None)

Saves the agent to a checkpoint.

Parameters:
  • directory (str) – Checkpoint directory (required).
  • filename (str) – Checkpoint filename, without extension (default: agent name).
  • format ("checkpoint" | "saved-model" | "numpy" | "hdf5") – File format, “checkpoint” uses the TensorFlow Checkpoint to save the model, “saved-model” uses the TensorFlow SavedModel to save an optimized act-only model (use only if you really need TF’s SavedModel format, loading not supported), whereas the others store only variables as NumPy/HDF5 file (default: TensorFlow Checkpoint).
  • append ("timesteps" | "episodes" | "updates") – Append timestep/episode/update to checkpoint filename (default: none).
Returns:

Checkpoint path.

Return type:

str

Tensor value tracking

TensorforceAgent.tracked_tensors()

Returns the current value of all tracked tensors (as specified by “tracking” agent argument). Note that not all tensors change at every timestep.

Returns:Dictionary containing the current value of all tracked tensors.
Return type:dict[values]

Specification and architecture

TensorforceAgent.get_specification()

Returns the agent specification.

Returns:Agent specification.
Return type:dict
TensorforceAgent.get_architecture()

Returns a string representation of the network layer architecture (policy, baseline, state-preprocessing).

Returns:String representation of network architecture.
Return type:str