Random Agent

class tensorforce.agents.RandomAgent(states, actions, max_episode_timesteps=None, config=None, recorder=None)

Agent returning random action values (specification key: random).

Parameters:
  • states (specification) – States specification (required, better implicitly specified via environment argument for Agent.create(...)), arbitrarily nested dictionary of state descriptions (usually taken from Environment.states()) with the following attributes:
    • type ("bool" | "int" | "float") – state data type (default: "float").
    • shape (int | iter[int]) – state shape (required).
    • num_values (int > 0) – number of discrete state values (required for type "int").
    • min_value/max_value (float) – minimum/maximum state value (optional for type "float").
  • actions (specification) – Actions specification (required, better implicitly specified via environment argument for Agent.create(...)), arbitrarily nested dictionary of action descriptions (usually taken from Environment.actions()) with the following attributes:
    • type ("bool" | "int" | "float") – action data type (required).
    • shape (int > 0 | iter[int > 0]) – action shape (default: scalar).
    • num_values (int > 0) – number of discrete action values (required for type "int").
    • min_value/max_value (float) – minimum/maximum action value (optional for type "float").
  • max_episode_timesteps (int > 0) – Upper bound for numer of timesteps per episode (default: not given, better implicitly specified via environment argument for Agent.create(...)).
  • config (specification) – Additional configuration options:
    • name (string) – Agent name, used e.g. for TensorFlow scopes (default: "agent").
    • device (string) – Device name (default: TensorFlow default).
    • seed (int) – Random seed to set for Python, NumPy (both set globally!) and TensorFlow, environment seed may have to be set separately for fully deterministic execution (default: none).
    • buffer_observe (false | "episode" | int > 0) – Number of timesteps within an episode to buffer before calling the internal observe function, to reduce calls to TensorFlow for improved performance (default: configuration-specific maximum number which can be buffered without affecting performance).
    • always_apply_exploration (bool) – Whether to always apply exploration, also for independent `act() calls (final value in case of schedule) (<span style=”color:#00C000”><b>default</b></span>: false).</li> <li><b>always_apply_variable_noise</b> (<i>bool</i>) &ndash; Whether to always apply variable noise, also for independent act() calls (final value in case of schedule) (<span style=”color:#00C000”><b>default</b></span>: false).</li> <li><b>enable_int_action_masking</b> (<i>bool</i>) &ndash; Whether int action options can be masked via an optional “[ACTION-NAME]_mask” state input (<span style=”color:#00C000”><b>default</b></span>: true).</li> <li><b>create_tf_assertions</b> (<i>bool</i>) &ndash; Whether to create internal TensorFlow assertion operations (<span style=”color:#00C000”><b>default</b></span>: true).</li> </ul>`
    • recorder (path | specification) – Traces recordings directory, or recorder configuration with the following attributes (see record-and-pretrain script for example application) (default: no recorder):
      • directory (path) – recorder directory (required).
      • frequency (int > 0) – how frequently in episodes to record traces (default: every episode).
      • start (int >= 0) – how many episodes to skip before starting to record traces (default: 0).
      • max-traces (int > 0) – maximum number of traces to keep (default: all).