Runner utility

class tensorforce.execution.Runner(agent, environment=None, max_episode_timesteps=None, evaluation=False, num_parallel=None, environments=None, remote=None, blocking=False, host=None, port=None)

Tensorforce runner utility.

Parameters:
  • agent (specification | Agent object) – Agent specification or object, the latter is not closed automatically as part of runner.close(), parallel_interactions is implicitly specified as / expected to be at least num_parallel, -1 if evaluation (required).
  • environment (specification | Environment object) – Environment specification or object, the latter is not closed automatically as part of runner.close() (required, or alternatively environments, invalid for “socket-client” remote mode).
  • max_episode_timesteps (int > 0) – Maximum number of timesteps per episode, overwrites the environment default if defined (default: environment default, invalid for “socket-client” remote mode).
  • evaluation (bool) – Whether to run the (last if multiple) environment in evaluation mode (default: no evaluation).
  • num_parallel (int > 0) – Number of environment instances to execute in parallel (default: no parallel execution, implicitly specified by environments).
  • environments (list[specification | Environment object]) – Environment specifications or objects to execute in parallel, the latter are not closed automatically as part of runner.close() (default: no parallel execution, alternatively specified via environment and num_parallel, invalid for “socket-client” remote mode).
  • remote ("multiprocessing" | "socket-client") – Communication mode for remote environment execution of parallelized environment execution, not compatible with environment(s) given as Environment objects, “socket-client” mode requires a corresponding “socket-server” running (default: local execution).
  • blocking (bool) – Whether remote environment calls should be blocking, only valid if remote mode given (default: not blocking, invalid unless “multiprocessing” or “socket-client” remote mode).
  • host (str, iter[str]) – Socket server hostname(s) or IP address(es) (required only for “socket-client” remote mode).
  • port (int, iter[int]) – Socket server port(s), increasing sequence if single host and port given (required only for “socket-client” remote mode).
run(num_episodes=None, num_timesteps=None, num_updates=None, batch_agent_calls=False, sync_timesteps=False, sync_episodes=False, num_sleep_secs=0.001, callback=None, callback_episode_frequency=None, callback_timestep_frequency=None, use_tqdm=True, mean_horizon=1, evaluation=False, save_best_agent=None, evaluation_callback=None)

Run experiment.

Parameters:
  • num_episodes (int > 0) – Number of episodes to run experiment (default: no episode limit).
  • num_timesteps (int > 0) – Number of timesteps to run experiment (default: no timestep limit).
  • num_updates (int > 0) – Number of agent updates to run experiment (default: no update limit).
  • batch_agent_calls (bool) – Whether to batch agent calls for parallel environment execution (default: false, separate call per environment).
  • sync_timesteps (bool) – Whether to synchronize parallel environment execution on timestep-level, implied by batch_agent_calls (default: false, unless batch_agent_calls is true).
  • sync_episodes (bool) – Whether to synchronize parallel environment execution on episode-level (default: false).
  • num_sleep_secs (float) – Sleep duration if no environment is ready (default: one milliseconds).
  • callback ((Runner, parallel) -> bool) – Callback function taking the runner instance plus parallel index and returning a boolean value indicating whether execution should continue (default: callback always true).
  • callback_episode_frequency (int) – Episode interval between callbacks (default: every episode).
  • callback_timestep_frequency (int) – Timestep interval between callbacks (default: not specified).
  • use_tqdm (bool) – Whether to display a tqdm progress bar for the experiment run (default: true), with the following additional information (averaged over number of episodes given via mean_horizon):
    • reward – cumulative episode reward
    • ts/ep – timesteps per episode
    • sec/ep – seconds per episode
    • ms/ts – milliseconds per timestep
    • agent – percentage of time spent on agent computation
    • comm – if remote environment execution, percentage of time spent on communication
  • mean_horizon (int) – Number of episodes progress bar values and evaluation score are averaged over (default: not averaged).
  • evaluation (bool) – Whether to run in evaluation mode, only valid if a single environment (default: no evaluation).
  • save_best_agent (string) – Directory to save the best version of the agent according to the evaluation score (default: best agent is not saved).
  • evaluation_callback (int | Runner -> float) – Callback function taking the runner instance and returning an evaluation score (default: cumulative evaluation reward averaged over mean_horizon episodes).