Constant Agent

class tensorforce.agents.ConstantAgent(states, actions, max_episode_timesteps=None, action_values=None, name='agent', device=None, seed=None, summarizer=None, recorder=None, config=None)[source]

Agent returning constant action values (specification key: constant).

Parameters:
  • states (specification) – States specification (required, better implicitly specified via environment argument for Agent.create(...)), arbitrarily nested dictionary of state descriptions (usually taken from Environment.states()) with the following attributes:
    • type ("bool" | "int" | "float") – state data type (default: "float").
    • shape (int | iter[int]) – state shape (required).
    • num_values (int > 0) – number of discrete state values (required for type "int").
    • min_value/max_value (float) – minimum/maximum state value (optional for type "float").
  • actions (specification) – Actions specification (required, better implicitly specified via environment argument for Agent.create(...)), arbitrarily nested dictionary of action descriptions (usually taken from Environment.actions()) with the following attributes:
    • type ("bool" | "int" | "float") – action data type (required).
    • shape (int > 0 | iter[int > 0]) – action shape (default: scalar).
    • num_values (int > 0) – number of discrete action values (required for type "int").
    • min_value/max_value (float) – minimum/maximum action value (optional for type "float").
  • max_episode_timesteps (int > 0) – Upper bound for numer of timesteps per episode (default: not given, better implicitly specified via environment argument for Agent.create(...)).
  • action_values (dict[value]) – Constant value per action (default: false for binary boolean actions, 0 for discrete integer actions, 0.0 for continuous actions).
  • seed (int) – Random seed to set for Python, NumPy (both set globally!) and TensorFlow, environment seed has to be set separately for a fully deterministic execution (default: none).
  • name (string) – Agent name, used e.g. for TensorFlow scopes (default: “agent”).
  • device (string) – Device name (default: TensorFlow default).
  • summarizer (specification) – TensorBoard summarizer configuration with the following attributes (default: no summarizer):
    • directory (path) – summarizer directory (required).
    • frequency (int > 0) – how frequently in timesteps to record summaries (default: always).
    • flush (int > 0) – how frequently in seconds to flush the summary writer (default: 10).
    • max-summaries (int > 0) – maximum number of summaries to keep (default: 5).
    • labels ("all" | iter[string]) – all or list of summaries to record, from the following labels (default: only "graph"):
    • "graph": graph summary
    • "parameters": parameter scalars
  • recorder (specification) – Experience traces recorder configuration with the following attributes (default: no recorder):
    • directory (path) – recorder directory (required).
    • frequency (int > 0) – how frequently in episodes to record traces (default: every episode).
    • start (int >= 0) – how many episodes to skip before starting to record traces (default: 0).
    • max-traces (int > 0) – maximum number of traces to keep (default: all).