A reinforcement learning environment provides the API to a simulated or real environment as the subject for optimization. It could be anything from video games (e.g. Atari) to robots or trading systems. The agent interacts with this environment and learns to act optimally in its dynamics.

Environment <-> Runner <-> Agent <-> Model
class tensorforce.environments.Environment

Base environment class.


Return the action space. Might include subdicts if multiple actions are available simultaneously.

Returns: dict of action properties (continuous, number of actions)


Close environment. No other method calls possible afterwards.


Executes action, observes next state(s) and reward.

Parameters:actions -- Actions to execute.
Returns:(Dict of) next state(s), boolean indicating terminal, and reward signal.
static from_spec(spec, kwargs)

Creates an environment from a specification dict.


Reset environment and setup for new episode.

Returns:initial state of reset environment.

Sets the random seed of the environment to the given value (current time, if seed=None). Naturally deterministic Environments (e.g. ALE or some gym Envs) don't have to implement this method.

Parameters:seed (int) -- The seed to use for initializing the pseudo-random number generator (default=epoch time in sec).

Returns: The actual seed (int) used OR None if Environment did not override this method (no seeding supported).


Return the state space. Might include subdicts if multiple states are available simultaneously.

Returns: dict of state properties (shape and type).

Ready-to-use environments

OpenAI Gym

class tensorforce.contrib.openai_gym.OpenAIGym(gym_id, monitor=None, monitor_safe=False, monitor_video=0, visualize=False)

Bases: tensorforce.environments.environment.Environment

__init__(gym_id, monitor=None, monitor_safe=False, monitor_video=0, visualize=False)

Initialize OpenAI Gym.

  • gym_id -- OpenAI Gym environment ID. See https://gym.openai.com/envs
  • monitor -- Output directory. Setting this to None disables monitoring.
  • monitor_safe -- Setting this to True prevents existing log files to be overwritten. Default False.
  • monitor_video -- Save a video every monitor_video steps. Setting this to 0 disables recording of videos.
  • visualize -- If set True, the program will visualize the trainings of gym's environment. Note that such visualization is probabily going to slow down the training.

OpenAI Universe

class tensorforce.contrib.openai_universe.OpenAIUniverse(env_id)

Bases: tensorforce.environments.environment.Environment

OpenAI Universe Integration: https://universe.openai.com/. Contains OpenAI Gym: https://gym.openai.com/.


Initialize OpenAI universe environment.

Parameters:env_id -- string with id/descriptor of the universe environment, e.g. 'HarvestDay-v0'.

Deepmind Lab

class tensorforce.contrib.deepmind_lab.DeepMindLab(level_id, repeat_action=1, state_attribute='RGB_INTERLACED', settings={'width': '320', 'appendCommand': '', 'fps': '60', 'height': '240'})

Bases: tensorforce.environments.environment.Environment

DeepMind Lab Integration: https://arxiv.org/abs/1612.03801 https://github.com/deepmind/lab

Since DeepMind lab is only available as source code, a manual install via bazel is required. Further, due to the way bazel handles external dependencies, cloning TensorForce into lab is the most convenient way to run it using the bazel BUILD file we provide. To use lab, first download and install it according to instructions https://github.com/deepmind/lab/blob/master/docs/build.md:

git clone https://github.com/deepmind/lab.git

Add to the lab main BUILD file:

Clone TensorForce into the lab directory, then run the TensorForce bazel runner.

Note that using any specific configuration file currently requires changing the Tensorforce BUILD file to adjust environment parameters.

bazel run //tensorforce:lab_runner

Please note that we have not tried to reproduce any lab results yet, and these instructions just explain connectivity in case someone wants to get started there.

__init__(level_id, repeat_action=1, state_attribute='RGB_INTERLACED', settings={'width': '320', 'appendCommand': '', 'fps': '60', 'height': '240'})

Initialize DeepMind Lab environment.

  • level_id -- string with id/descriptor of the level, e.g. 'seekavoid_arena_01'.
  • repeat_action -- number of frames the environment is advanced, executing the given action during every frame.
  • state_attribute -- Attributes which represents the state for this environment, should adhere to the specification given in DeepMindLabEnvironment.state_spec(level_id).
  • settings -- dict specifying additional settings as key-value string pairs. The following options are recognized: 'width' (horizontal resolution of the observation frames), 'height' (vertical resolution of the observation frames), 'fps' (frames per second) and 'appendCommand' (commands for the internal Quake console).

Closes the environment and releases the underlying Quake III Arena instance. No other method calls possible afterwards.


Pass action to universe environment, return reward, next step, terminal state and additional info.

Parameters:action -- action to execute as numpy array, should have dtype np.intc and should adhere to the specification given in DeepMindLabEnvironment.action_spec(level_id)
Returns:dict containing the next state, the reward, and a boolean indicating if the next state is a terminal state

An advisory metric that correlates discrete environment steps ("frames") with real (wallclock) time: the number of frames per (real) second.


Number of frames since the last reset() call.


Resets the environment to its initialization state. This method needs to be called to start a new episode after the last episode ended.

Returns:initial state

Unreal Engine 4 Games

class tensorforce.contrib.unreal_engine.UE4Environment(host='localhost', port=6025, connect=True, discretize_actions=False, delta_time=0, num_ticks=4)

Bases: tensorforce.contrib.remote_environment.RemoteEnvironment, tensorforce.contrib.state_settable_environment.StateSettableEnvironment

A special RemoteEnvironment for UE4 game connections. Communicates with the remote to receive information on the definitions of action- and observation spaces. Sends UE4 Action- and Axis-mappings as RL-actions and receives observations back defined by MLObserver objects placed in the Game (these could be camera pixels or other observations, e.g. a x/y/z position of some game actor).

__init__(host='localhost', port=6025, connect=True, discretize_actions=False, delta_time=0, num_ticks=4)
  • host (str) -- The hostname to connect to.
  • port (int) -- The port to connect to.
  • connect (bool) -- Whether to connect already in this c'tor.
  • discretize_actions (bool) -- Whether to treat axis-mappings defined in UE4 game as discrete actions. This would be necessary e.g. for agents that use q-networks where the output are q-values per discrete state-action pair.
  • delta_time (float) -- The fake delta time to use for each single game tick.
  • num_ticks (int) -- The number of ticks to be executed in a single act call (each tick will repeat the same given actions).

Creates a list of discrete action(-combinations) in case we want to learn with a discrete set of actions, but only have action-combinations (maybe even continuous) available from the env. E.g. the UE4 game has the following action/axis-mappings:

    {'type': 'action', 'keys': ('SpaceBar',)},
    {'type': 'axis', 'keys': (('Right', 1.0), ('Left', -1.0), ('A', -1.0), ('D', 1.0))},

-> this method will discretize them into the following 6 discrete actions:

[(Right, 0.0),(SpaceBar, False)],
[(Right, 0.0),(SpaceBar, True)]
[(Right, -1.0),(SpaceBar, False)],
[(Right, -1.0),(SpaceBar, True)],
[(Right, 1.0),(SpaceBar, False)],
[(Right, 1.0),(SpaceBar, True)],

Executes a single step in the UE4 game. This step may be comprised of one or more actual game ticks for all of which the same given action- and axis-inputs (or action number in case of discretized actions) are repeated. UE4 distinguishes between action-mappings, which are boolean actions (e.g. jump or dont-jump) and axis-mappings, which are continuous actions like MoveForward with values between -1.0 (run backwards) and 1.0 (run forwards), 0.0 would mean: stop.


same as step (no kwargs to pass), but needs to block and return observation_dict

  • stores the received observation in self.last_observation

Translates a list of tuples ([pretty mapping], [value]) to a list of tuples ([some key], [translated value]) each single item in abstract will undergo the following translation:

Example1: we want: "MoveRight": 5.0 possible keys for the action are: ("Right", 1.0), ("Left", -1.0) result: "Right": 5.0 * 1.0 = 5.0

Example2: we want: "MoveRight": -0.5 possible keys for the action are: ("Left", -1.0), ("Right", 1.0) result: "Left": -0.5 * -1.0 = 0.5 (same as "Right": -0.5)