tensorforce.contrib package

Submodules

tensorforce.contrib.ale module

Arcade Learning Environment (ALE). https://github.com/mgbellemare/Arcade-Learning-Environment

class tensorforce.contrib.ale.ALE(rom, frame_skip=1, repeat_action_probability=0.0, loss_of_life_termination=False, loss_of_life_reward=0, display_screen=False, seed=<mtrand.RandomState object>)

Bases: tensorforce.environments.environment.Environment

action_names
actions
close()
current_state
execute(actions)
is_terminal
reset()
states

tensorforce.contrib.deepmind_lab module

class tensorforce.contrib.deepmind_lab.DeepMindLab(level_id, repeat_action=1, state_attribute='RGB_INTERLACED', settings={'width': '320', 'appendCommand': '', 'fps': '60', 'height': '240'})

Bases: tensorforce.environments.environment.Environment

DeepMind Lab Integration: https://arxiv.org/abs/1612.03801 https://github.com/deepmind/lab

Since DeepMind lab is only available as source code, a manual install via bazel is required. Further, due to the way bazel handles external dependencies, cloning TensorForce into lab is the most convenient way to run it using the bazel BUILD file we provide. To use lab, first download and install it according to instructions https://github.com/deepmind/lab/blob/master/docs/build.md:

git clone https://github.com/deepmind/lab.git

Add to the lab main BUILD file:

Clone TensorForce into the lab directory, then run the TensorForce bazel runner.

Note that using any specific configuration file currently requires changing the Tensorforce BUILD file to adjust environment parameters.

bazel run //tensorforce:lab_runner

Please note that we have not tried to reproduce any lab results yet, and these instructions just explain connectivity in case someone wants to get started there.

actions
close()

Closes the environment and releases the underlying Quake III Arena instance. No other method calls possible afterwards.

execute(actions)

Pass action to universe environment, return reward, next step, terminal state and additional info.

Parameters:action – action to execute as numpy array, should have dtype np.intc and should adhere to the specification given in DeepMindLabEnvironment.action_spec(level_id)
Returns:dict containing the next state, the reward, and a boolean indicating if the next state is a terminal state
fps

An advisory metric that correlates discrete environment steps (“frames”) with real (wallclock) time: the number of frames per (real) second.

num_steps

Number of frames since the last reset() call.

reset()

Resets the environment to its initialization state. This method needs to be called to start a new episode after the last episode ended.

Returns:initial state
states

tensorforce.contrib.maze_explorer module

class tensorforce.contrib.maze_explorer.MazeExplorer(mode_id=0, visible=True)

Bases: tensorforce.environments.environment.Environment

MazeExplorer Integration: https://github.com/mryellow/maze_explorer.

actions
close()
execute(actions)
reset()
states

tensorforce.contrib.openai_gym module

OpenAI Gym Integration: https://gym.openai.com/.

class tensorforce.contrib.openai_gym.OpenAIGym(gym_id, monitor=None, monitor_safe=False, monitor_video=0, visualize=False)

Bases: tensorforce.environments.environment.Environment

static action_from_space(space)
actions
close()
execute(actions)
reset()
static state_from_space(space)
states

tensorforce.contrib.openai_universe module

class tensorforce.contrib.openai_universe.OpenAIUniverse(env_id)

Bases: tensorforce.environments.environment.Environment

OpenAI Universe Integration: https://universe.openai.com/. Contains OpenAI Gym: https://gym.openai.com/.

actions
close()
configure(*args, **kwargs)
execute(actions)
render(*args, **kwargs)
reset()
states

tensorforce.contrib.remote_environment module

class tensorforce.contrib.remote_environment.MsgPackNumpyProtocol(max_msg_len=8192)

Bases: object

A simple protocol to communicate over tcp sockets, which can be used by RemoteEnvironment implementations. The protocol is based on msgpack-numpy encoding and decoding.

Each message has a simple 8-byte header, which encodes the length of the subsequent msgpack-numpy encoded byte-string. All messages received need to have the ‘status’ field set to ‘ok’. If ‘status’ is set to ‘error’, the field ‘message’ should be populated with some error information.

Examples: client sends: “[8-byte header]msgpack-encoded({“cmd”: “seed”, “value”: 200})” server responds: “[8-byte header]msgpack-encoded({“status”: “ok”, “value”: 200})”

client sends: “[8-byte header]msgpack-encoded({“cmd”: “reset”})” server responds: “[8-byte header]msgpack-encoded({“status”: “ok”})”

client sends: “[8-byte header]msgpack-encoded({“cmd”: “step”, “action”: 5})” server responds: “[8-byte header]msgpack-encoded({“status”: “ok”, “obs_dict”: {… some observations}, “reward”: -10.0, “is_terminal”: False})”

recv(socket_)

Receives a message as msgpack-numpy encoded byte-string from the given socket object. Blocks until something was received.

Parameters:socket – The python socket object to use.

Returns: The decoded (as dict) message received.

send(message, socket_)

Sends a message (dict) to the socket. Message is encoded via msgpack-numpy.

Parameters:
  • message – The message dict (e.g. {“cmd”: “reset”})
  • socket – The python socket object to use.
class tensorforce.contrib.remote_environment.RemoteEnvironment(host='localhost', port=6025)

Bases: tensorforce.environments.environment.Environment

close()

Same as disconnect method.

connect()

Starts the server tcp connection on the given host:port.

current_state
disconnect()

Ends our server tcp connection.

tensorforce.contrib.state_settable_environment module

class tensorforce.contrib.state_settable_environment.StateSettableEnvironment

Bases: tensorforce.environments.environment.Environment

An Environment that implements the set_state method to set the current state to some new state using setter instructions.

set_state(**kwargs)

Sets the current state of the environment manually to some other state and returns a new observation.

Parameters:**kwargs
The set instruction(s) to be executed by the environment.
A single set instruction usually set a single property of the

state/observation vector to some new value.

Returns: The observation dictionary of the Environment after(!) setting it to the new state.

tensorforce.contrib.unreal_engine module

class tensorforce.contrib.unreal_engine.UE4Environment(host='localhost', port=6025, connect=True, discretize_actions=False, delta_time=0, num_ticks=4)

Bases: tensorforce.contrib.remote_environment.RemoteEnvironment, tensorforce.contrib.state_settable_environment.StateSettableEnvironment

A special RemoteEnvironment for UE4 game connections. Communicates with the remote to receive information on the definitions of action- and observation spaces. Sends UE4 Action- and Axis-mappings as RL-actions and receives observations back defined by ducandu plugin Observer objects placed in the Game (these could be camera pixels or other observations, e.g. a x/y/z position of some game actor).

actions()
connect()
discretize_action_space_desc()

Creates a list of discrete action(-combinations) in case we want to learn with a discrete set of actions, but only have action-combinations (maybe even continuous) available from the env. E.g. the UE4 game has the following action/axis-mappings:

{
'Fire':
    {'type': 'action', 'keys': ('SpaceBar',)},
'MoveRight':
    {'type': 'axis', 'keys': (('Right', 1.0), ('Left', -1.0), ('A', -1.0), ('D', 1.0))},
}

-> this method will discretize them into the following 6 discrete actions:

[
[(Right, 0.0),(SpaceBar, False)],
[(Right, 0.0),(SpaceBar, True)]
[(Right, -1.0),(SpaceBar, False)],
[(Right, -1.0),(SpaceBar, True)],
[(Right, 1.0),(SpaceBar, False)],
[(Right, 1.0),(SpaceBar, True)],
]
execute(actions)

Executes a single step in the UE4 game. This step may be comprised of one or more actual game ticks for all of which the same given action- and axis-inputs (or action number in case of discretized actions) are repeated. UE4 distinguishes between action-mappings, which are boolean actions (e.g. jump or dont-jump) and axis-mappings, which are continuous actions like MoveForward with values between -1.0 (run backwards) and 1.0 (run forwards), 0.0 would mean: stop.

static extract_observation(message)
reset()

same as step (no kwargs to pass), but needs to block and return observation_dict

  • stores the received observation in self.last_observation
seed(seed=None)
set_state(setters, **kwargs)
states()
translate_abstract_actions_to_keys(abstract)

Translates a list of tuples ([pretty mapping], [value]) to a list of tuples ([some key], [translated value]) each single item in abstract will undergo the following translation:

Example1: we want: “MoveRight”: 5.0 possible keys for the action are: (“Right”, 1.0), (“Left”, -1.0) result: “Right”: 5.0 * 1.0 = 5.0

Example2: we want: “MoveRight”: -0.5 possible keys for the action are: (“Left”, -1.0), (“Right”, 1.0) result: “Left”: -0.5 * -1.0 = 0.5 (same as “Right”: -0.5)

Module contents