Runner utility

class tensorforce.execution.Runner(agent, environment=None, max_episode_timesteps=None, num_parallel=None, environments=None, evaluation=False, remote=None, blocking=False, host=None, port=None)

Tensorforce runner utility.

Parameters:
  • agent (specification | Agent object | Agent.load kwargs) – Agent specification or object (note: if passed as object, agent.close() is not (!) automatically triggered as part of runner.close()), or keyword arguments to Agent.load() in particular containing directory, in all cases argument environment is implicitly specified as the following argument, and argument parallel_interactions is either implicitly specified as num_parallel or expected to be at least num_parallel (required).
  • environment (specification | Environment object) – Environment specification or object (note: if passed as object, environment.close() is not (!) automatically triggered as part of runner.close()), where argument max_episode_timesteps is implicitly specified as the following argument (required, or alternatively environments, invalid for “socket-client” remote mode).
  • max_episode_timesteps (int > 0) – Maximum number of timesteps per episode, overwrites the environment default if defined (default: environment default, invalid for “socket-client” remote mode).
  • num_parallel (int >= 2) – Number of environment instances to execute in parallel, usually requires argument remote to be specified for proper parallel execution unless vectorizable environment (default: no parallel execution, implicitly specified by environments).
  • environments (list[specification | Environment object]) – Environment specifications or objects to execute in parallel, the latter are not closed automatically as part of runner.close() (default: no parallel execution, alternatively specified via environment and num_parallel, invalid for “socket-client” remote mode).
  • evaluation (bool) – Whether to run the last of multiple parallel environments in evaluation mode, only valid with num_parallel or environments (default: no evaluation).
  • remote ("multiprocessing" | "socket-client") – Communication mode for remote environment execution of parallelized environment execution, not compatible with environment(s) given as Environment objects, “socket-client” mode requires a corresponding “socket-server” running (default: local execution).
  • blocking (bool) – Whether remote environment calls should be blocking, only valid if remote mode given (default: not blocking, invalid unless “multiprocessing” or “socket-client” remote mode).
  • host (str, iter[str]) – Socket server hostname(s) or IP address(es) (required only for “socket-client” remote mode).
  • port (int, iter[int]) – Socket server port(s), increasing sequence if single host and port given (required only for “socket-client” remote mode).
run(num_episodes=None, num_timesteps=None, num_updates=None, batch_agent_calls=False, sync_timesteps=False, sync_episodes=False, num_sleep_secs=0.001, callback=None, callback_episode_frequency=None, callback_timestep_frequency=None, use_tqdm=True, mean_horizon=1, evaluation=False, save_best_agent=None, evaluation_callback=None)

Run experiment.

Parameters:
  • num_episodes (int > 0) – Number of episodes to run experiment, sum of episodes across all parallel/vectorized environment(s) / actors in a multi-actor environment (default: no episode limit).
  • num_timesteps (int > 0) – Number of timesteps to run experiment, sum of timesteps across all parallel/vectorized environment(s) / actors in a multi-actor environment (default: no timestep limit).
  • num_updates (int > 0) – Number of agent updates to run experiment (default: no update limit).
  • batch_agent_calls (bool) – Whether to batch agent calls for parallel environment execution (default: false, separate call per environment).
  • sync_timesteps (bool) – Whether to synchronize parallel environment execution on timestep-level, implied by batch_agent_calls (default: false, unless batch_agent_calls is true).
  • sync_episodes (bool) – Whether to synchronize parallel environment execution on episode-level (default: false).
  • num_sleep_secs (float) – Sleep duration if no environment is ready (default: one milliseconds).
  • callback (callable[(Runner, parallel) -> bool]) – Callback function taking the runner instance plus parallel index and returning a boolean value indicating whether execution should continue (default: callback always true).
  • callback_episode_frequency (int) – Episode interval between callbacks (default: every episode).
  • callback_timestep_frequency (int) – Timestep interval between callbacks (default: not specified).
  • use_tqdm (bool) – Whether to display a tqdm progress bar for the experiment run (default: true), with the following additional information (averaged over number of episodes given via mean_horizon):
    • return – cumulative episode return
    • ts/ep – timesteps per episode
    • sec/ep – seconds per episode
    • ms/ts – milliseconds per timestep
    • agent – percentage of time spent on agent computation
    • comm – if remote environment execution, percentage of time spent on communication
  • mean_horizon (int) – Number of episodes progress bar values and evaluation score are averaged over (default: not averaged).
  • evaluation (bool) – Whether to run in evaluation mode, only valid if single environment (default: no evaluation).
  • save_best_agent (string) – Directory to save the best version of the agent according to the evaluation score (default: best agent is not saved).
  • evaluation_callback (int | callable[Runner -> float]) – Callback function taking the runner instance and returning an evaluation score (default: cumulative evaluation return averaged over mean_horizon episodes).