General agent interface

Initialization and termination

static TensorforceAgent.create(agent='tensorforce', environment=None, **kwargs)

Creates an agent from a specification.

Parameters:
  • agent (specification | Agent class/object | callable[states -> actions]) – JSON file, specification key, configuration dictionary, library module, or Agent class/object. Alternatively, an act-function mapping states to actions which is supposed to be recorded. (default: Tensorforce base agent).
  • environment (Environment object) – Environment which the agent is supposed to be trained on, environment-related arguments like state/action space specifications and maximum episode length will be extract if given (recommended).
  • kwargs – Additional agent arguments.
TensorforceAgent.reset()

Resets possibly inconsistent internal values, for instance, after saving and restoring an agent. Automatically triggered as part of Agent.create/load/initialize/restore.

TensorforceAgent.close()

Closes the agent.

Reinforcement learning interface

TensorforceAgent.act(states, internals=None, parallel=0, independent=False, deterministic=True, evaluation=None)

Returns action(s) for the given state(s), needs to be followed by observe() unless independent mode.

See the act-observe script for an example application as part of the act-observe interface.

Parameters:
  • states (dict[state] | iter[dict[state]]) – Dictionary containing state(s) to be acted on (required).
  • internals (dict[internal] | iter[dict[internal]]) – Dictionary containing current internal agent state(s), either given by initial_internals() at the beginning of an episode or as return value of the preceding act() call (required if independent mode and agent has internal states).
  • parallel (int | iter[int]) – Parallel execution index (default: 0).
  • independent (bool) – Whether act is not part of the main agent-environment interaction, and this call is thus not followed by observe() (default: false).
  • deterministic (bool) – Whether action should be chosen deterministically, so no sampling and no exploration, only valid in independent mode (default: true).
Returns:

dict[action] | iter[dict[action]], dict[internal] | iter[dict[internal]] if internals argument given: Dictionary containing action(s), dictionary containing next internal agent state(s) if independent mode.

TensorforceAgent.observe(reward=0.0, terminal=False, parallel=0)

Observes reward and whether a terminal state is reached, needs to be preceded by act().

See the act-observe script for an example application as part of the act-observe interface.

Parameters:
  • reward (float | iter[float]) – Reward (default: 0.0).
  • terminal (bool | 0 | 1 | 2 | iter[..]) – Whether a terminal state is reached, or 2 if the episode was aborted (default: false).
  • parallel (int, iter[int]) – Parallel execution index (default: 0).
Returns:

Number of performed updates.

Return type:

int

Get initial internals (for independent-act)

TensorforceAgent.initial_internals()

Returns the initial internal agent state(s), to be used at the beginning of an episode as internals argument for act() in independent mode

Returns:Dictionary containing initial internal agent state(s).
Return type:dict[internal]

Experience - update interface

TensorforceAgent.experience(states, actions, terminal, reward, internals=None)

Feed experience traces.

See the act-experience-update script for an example application as part of the act-experience-update interface, which is an alternative to the act-observe interaction pattern.

Parameters:
  • states (dict[array[state]]) – Dictionary containing arrays of states (required).
  • actions (dict[array[action]]) – Dictionary containing arrays of actions (required).
  • terminal (array[bool]) – Array of terminals (required).
  • reward (array[float]) – Array of rewards (required).
  • internals (dict[state]) – Dictionary containing arrays of internal agent states (required if agent has internal states).
TensorforceAgent.update(query=None, **kwargs)

Perform an update.

See the act-experience-update script for an example application as part of the act-experience-update interface, which is an alternative to the act-observe interaction pattern.

Pretraining

TensorforceAgent.pretrain(directory, num_iterations, num_traces=1, num_updates=1, extension='.npz')

Simple pretraining approach as a combination of experience() and update, akin to behavioral cloning, using experience traces obtained e.g. via recording agent interactions (see documentation).

For the given number of iterations, load the given number of trace files (which each contain recorder[frequency] episodes), feed the experience to the agent’s internal memory, and subsequently trigger the given number of updates (which will use the experience in the internal memory, fed in this or potentially previous iterations).

See the record-and-pretrain script for an example application.

Parameters:
  • directory (path) – Directory with experience traces, e.g. obtained via recorder; episode length has to be consistent with agent configuration (required).
  • num_iterations (int > 0) – Number of iterations consisting of loading new traces and performing multiple updates (required).
  • num_traces (int > 0) – Number of traces to load per iteration; has to at least satisfy the update batch size (default: 1).
  • num_updates (int > 0) – Number of updates per iteration (default: 1).
  • extension (str) – Traces file extension to filter the given directory for (default: “.npz”).

Loading and saving

static TensorforceAgent.load(directory=None, filename=None, format=None, environment=None, **kwargs)

Restores an agent from a directory/file.

Parameters:
  • directory (str) – Checkpoint directory (required, unless saver is specified).
  • filename (str) – Checkpoint filename, with or without append and extension (default: “agent”).
  • format ("checkpoint" | "saved-model" | "numpy" | "hdf5") – File format, “saved-model” loads an act-only agent based on a Protobuf model (default: format matching directory and filename, required to be unambiguous).
  • environment (Environment object) – Environment which the agent is supposed to be trained on, environment-related arguments like state/action space specifications and maximum episode length will be extract if given (recommended).
  • kwargs – Additional agent arguments.
TensorforceAgent.save(directory, filename=None, format='checkpoint', append=None)

Saves the agent to a checkpoint.

Parameters:
  • directory (str) – Checkpoint directory (required).
  • filename (str) – Checkpoint filename, without extension (required, unless “saved-model” format).
  • format ("checkpoint" | "saved-model" | "numpy" | "hdf5") – File format, “checkpoint” uses TensorFlow Checkpoint to save model, “saved-model” uses TensorFlow SavedModel to save an optimized act-only model, whereas the others store only variables as NumPy/HDF5 file (default: TensorFlow Checkpoint).
  • append ("timesteps" | "episodes" | "updates") – Append timestep/episode/update to checkpoint filename (default: none).
Returns:

Checkpoint path.

Return type:

str

Tracked tensors