tensorforce.contrib package¶
Submodules¶
tensorforce.contrib.ale module¶
-
class
tensorforce.contrib.ale.
ALE
(rom, frame_skip=1, repeat_action_probability=0.0, loss_of_life_termination=False, loss_of_life_reward=0, display_screen=False, seed=<mtrand.RandomState object>)¶ Bases:
tensorforce.environments.environment.Environment
Arcade Learning Environment (ALE). https://github.com/mgbellemare/Arcade-Learning-Environment
-
__init__
(rom, frame_skip=1, repeat_action_probability=0.0, loss_of_life_termination=False, loss_of_life_reward=0, display_screen=False, seed=<mtrand.RandomState object>)¶ Initialize ALE.
Parameters: - rom – Rom filename and directory.
- frame_skip – Repeat action for n frames. Default 1.
- repeat_action_probability – Repeats last action with given probability. Default 0.
- loss_of_life_termination – Signals a terminal state on loss of life. Default False.
- loss_of_life_reward – Reward/Penalty on loss of life (negative values are a penalty). Default 0.
- display_screen – Displays the emulator screen. Default False.
- seed – Random seed
-
action_names
¶
-
actions
¶
-
close
()¶
-
current_state
¶
-
execute
(actions)¶
-
from_spec
(spec, kwargs)¶ Creates an environment from a specification dict.
-
is_terminal
¶
-
reset
()¶
-
seed
(seed)¶ Sets the random seed of the environment to the given value (current time, if seed=None). Naturally deterministic Environments (e.g. ALE or some gym Envs) don’t have to implement this method.
Parameters: seed (int) – The seed to use for initializing the pseudo-random number generator (default=epoch time in sec). Returns: The actual seed (int) used OR None if Environment did not override this method (no seeding supported).
-
states
¶
-
tensorforce.contrib.deepmind_lab module¶
-
class
tensorforce.contrib.deepmind_lab.
DeepMindLab
(level_id, repeat_action=1, state_attribute='RGB_INTERLACED', settings={'width': '320', 'appendCommand': '', 'fps': '60', 'height': '240'})¶ Bases:
tensorforce.environments.environment.Environment
DeepMind Lab Integration: https://arxiv.org/abs/1612.03801 https://github.com/deepmind/lab
Since DeepMind lab is only available as source code, a manual install via bazel is required. Further, due to the way bazel handles external dependencies, cloning TensorForce into lab is the most convenient way to run it using the bazel BUILD file we provide. To use lab, first download and install it according to instructions https://github.com/deepmind/lab/blob/master/docs/build.md:
git clone https://github.com/deepmind/lab.git
Add to the lab main BUILD file:
Clone TensorForce into the lab directory, then run the TensorForce bazel runner.
Note that using any specific configuration file currently requires changing the Tensorforce BUILD file to adjust environment parameters.
bazel run //tensorforce:lab_runner
Please note that we have not tried to reproduce any lab results yet, and these instructions just explain connectivity in case someone wants to get started there.
-
__init__
(level_id, repeat_action=1, state_attribute='RGB_INTERLACED', settings={'width': '320', 'appendCommand': '', 'fps': '60', 'height': '240'})¶ Initialize DeepMind Lab environment.
Parameters: - level_id – string with id/descriptor of the level, e.g. ‘seekavoid_arena_01’.
- repeat_action – number of frames the environment is advanced, executing the given action during every frame.
- state_attribute – Attributes which represents the state for this environment, should adhere to the specification given in DeepMindLabEnvironment.state_spec(level_id).
- settings – dict specifying additional settings as key-value string pairs. The following options are recognized: ‘width’ (horizontal resolution of the observation frames), ‘height’ (vertical resolution of the observation frames), ‘fps’ (frames per second) and ‘appendCommand’ (commands for the internal Quake console).
-
actions
¶
-
close
()¶ Closes the environment and releases the underlying Quake III Arena instance. No other method calls possible afterwards.
-
execute
(actions)¶ Pass action to universe environment, return reward, next step, terminal state and additional info.
Parameters: action – action to execute as numpy array, should have dtype np.intc and should adhere to the specification given in DeepMindLabEnvironment.action_spec(level_id) Returns: dict containing the next state, the reward, and a boolean indicating if the next state is a terminal state
-
fps
¶ An advisory metric that correlates discrete environment steps (“frames”) with real (wallclock) time: the number of frames per (real) second.
-
from_spec
(spec, kwargs)¶ Creates an environment from a specification dict.
-
num_steps
¶ Number of frames since the last reset() call.
-
reset
()¶ Resets the environment to its initialization state. This method needs to be called to start a new episode after the last episode ended.
Returns: initial state
-
seed
(seed)¶ Sets the random seed of the environment to the given value (current time, if seed=None). Naturally deterministic Environments (e.g. ALE or some gym Envs) don’t have to implement this method.
Parameters: seed (int) – The seed to use for initializing the pseudo-random number generator (default=epoch time in sec). Returns: The actual seed (int) used OR None if Environment did not override this method (no seeding supported).
-
states
¶
-
tensorforce.contrib.maze_explorer module¶
-
class
tensorforce.contrib.maze_explorer.
MazeExplorer
(mode_id=0, visible=True)¶ Bases:
tensorforce.environments.environment.Environment
MazeExplorer Integration: https://github.com/mryellow/maze_explorer.
-
__init__
(mode_id=0, visible=True)¶ Initialize MazeExplorer.
Parameters: - mode_id – Game mode ID. See https://github.com/mryellow/maze_explorer
- visible – Show output window
-
actions
¶
-
close
()¶
-
execute
(actions)¶
-
from_spec
(spec, kwargs)¶ Creates an environment from a specification dict.
-
reset
()¶
-
seed
(seed)¶ Sets the random seed of the environment to the given value (current time, if seed=None). Naturally deterministic Environments (e.g. ALE or some gym Envs) don’t have to implement this method.
Parameters: seed (int) – The seed to use for initializing the pseudo-random number generator (default=epoch time in sec). Returns: The actual seed (int) used OR None if Environment did not override this method (no seeding supported).
-
states
¶
-
tensorforce.contrib.openai_gym module¶
OpenAI Gym Integration: https://gym.openai.com/.
-
class
tensorforce.contrib.openai_gym.
OpenAIGym
(gym_id, monitor=None, monitor_safe=False, monitor_video=0, visualize=False)¶ Bases:
tensorforce.environments.environment.Environment
-
__init__
(gym_id, monitor=None, monitor_safe=False, monitor_video=0, visualize=False)¶ Initialize OpenAI Gym.
Parameters: - gym_id – OpenAI Gym environment ID. See https://gym.openai.com/envs
- monitor – Output directory. Setting this to None disables monitoring.
- monitor_safe – Setting this to True prevents existing log files to be overwritten. Default False.
- monitor_video – Save a video every monitor_video steps. Setting this to 0 disables recording of videos.
- visualize – If set True, the program will visualize the trainings of gym’s environment. Note that such visualization is probabily going to slow down the training.
-
static
action_from_space
(space)¶
-
actions
¶
-
close
()¶
-
execute
(actions)¶
-
from_spec
(spec, kwargs)¶ Creates an environment from a specification dict.
-
reset
()¶
-
seed
(seed)¶ Sets the random seed of the environment to the given value (current time, if seed=None). Naturally deterministic Environments (e.g. ALE or some gym Envs) don’t have to implement this method.
Parameters: seed (int) – The seed to use for initializing the pseudo-random number generator (default=epoch time in sec). Returns: The actual seed (int) used OR None if Environment did not override this method (no seeding supported).
-
static
state_from_space
(space)¶
-
states
¶
-
tensorforce.contrib.openai_universe module¶
-
class
tensorforce.contrib.openai_universe.
OpenAIUniverse
(env_id)¶ Bases:
tensorforce.environments.environment.Environment
OpenAI Universe Integration: https://universe.openai.com/. Contains OpenAI Gym: https://gym.openai.com/.
-
__init__
(env_id)¶ Initialize OpenAI universe environment.
Parameters: env_id – string with id/descriptor of the universe environment, e.g. ‘HarvestDay-v0’.
-
actions
¶
-
close
()¶
-
configure
(*args, **kwargs)¶
-
execute
(actions)¶
-
from_spec
(spec, kwargs)¶ Creates an environment from a specification dict.
-
render
(*args, **kwargs)¶
-
reset
()¶
-
seed
(seed)¶ Sets the random seed of the environment to the given value (current time, if seed=None). Naturally deterministic Environments (e.g. ALE or some gym Envs) don’t have to implement this method.
Parameters: seed (int) – The seed to use for initializing the pseudo-random number generator (default=epoch time in sec). Returns: The actual seed (int) used OR None if Environment did not override this method (no seeding supported).
-
states
¶
-
tensorforce.contrib.remote_environment module¶
-
class
tensorforce.contrib.remote_environment.
MsgPackNumpyProtocol
(max_msg_len=8192)¶ Bases:
object
A simple protocol to communicate over tcp sockets, which can be used by RemoteEnvironment implementations. The protocol is based on msgpack-numpy encoding and decoding.
Each message has a simple 8-byte header, which encodes the length of the subsequent msgpack-numpy encoded byte-string. All messages received need to have the ‘status’ field set to ‘ok’. If ‘status’ is set to ‘error’, the field ‘message’ should be populated with some error information.
Examples: client sends: “[8-byte header]msgpack-encoded({“cmd”: “seed”, “value”: 200})” server responds: “[8-byte header]msgpack-encoded({“status”: “ok”, “value”: 200})”
client sends: “[8-byte header]msgpack-encoded({“cmd”: “reset”})” server responds: “[8-byte header]msgpack-encoded({“status”: “ok”})”
client sends: “[8-byte header]msgpack-encoded({“cmd”: “step”, “action”: 5})” server responds: “[8-byte header]msgpack-encoded({“status”: “ok”, “obs_dict”: {… some observations}, “reward”: -10.0, “is_terminal”: False})”
-
__init__
(max_msg_len=8192)¶ Parameters: max_msg_len (int) – The maximum number of bytes to read from the socket.
-
recv
(socket_)¶ Receives a message as msgpack-numpy encoded byte-string from the given socket object. Blocks until something was received.
Parameters: socket – The python socket object to use. Returns: The decoded (as dict) message received.
-
send
(message, socket_)¶ - Sends a message (dict) to the socket. Message consists of a 8-byte len header followed by a msgpack-numpy
- encoded dict.
Parameters: - message – The message dict (e.g. {“cmd”: “reset”})
- socket – The python socket object to use.
-
-
class
tensorforce.contrib.remote_environment.
RemoteEnvironment
(host='localhost', port=6025)¶ Bases:
tensorforce.environments.environment.Environment
-
__init__
(host='localhost', port=6025)¶ A remote Environment that one can connect to through tcp. Implements a simple msgpack protocol to get the step/reset/etc.. commands to the remote server and simply waits (blocks) for a response.
Parameters: - host (str) – The hostname to connect to.
- port (int) – The port to connect to.
-
actions
¶ Return the action space. Might include subdicts if multiple actions are available simultaneously.
Returns: dict of action properties (continuous, number of actions)
-
close
()¶ Same as disconnect method.
-
connect
(timeout=600)¶ Starts the server tcp connection on the given host:port.
Parameters: timeout (int) – The time (in seconds) for which we will attempt a connection to the remote (every 5sec). After that (or if timeout is None or 0), an error is raised.
-
current_state
¶
-
disconnect
()¶ Ends our server tcp connection.
-
execute
(actions)¶ Executes action, observes next state(s) and reward.
Parameters: actions – Actions to execute. Returns: (Dict of) next state(s), boolean indicating terminal, and reward signal.
-
from_spec
(spec, kwargs)¶ Creates an environment from a specification dict.
-
reset
()¶ Reset environment and setup for new episode.
Returns: initial state of reset environment.
-
seed
(seed)¶ Sets the random seed of the environment to the given value (current time, if seed=None). Naturally deterministic Environments (e.g. ALE or some gym Envs) don’t have to implement this method.
Parameters: seed (int) – The seed to use for initializing the pseudo-random number generator (default=epoch time in sec). Returns: The actual seed (int) used OR None if Environment did not override this method (no seeding supported).
-
states
¶ Return the state space. Might include subdicts if multiple states are available simultaneously.
Returns: dict of state properties (shape and type).
-
tensorforce.contrib.state_settable_environment module¶
-
class
tensorforce.contrib.state_settable_environment.
StateSettableEnvironment
¶ Bases:
tensorforce.environments.environment.Environment
An Environment that implements the set_state method to set the current state to some new state using setter instructions.
-
__init__
¶ x.init(…) initializes x; see help(type(x)) for signature
-
actions
¶ Return the action space. Might include subdicts if multiple actions are available simultaneously.
Returns: dict of action properties (continuous, number of actions)
-
close
()¶ Close environment. No other method calls possible afterwards.
-
execute
(actions)¶ Executes action, observes next state(s) and reward.
Parameters: actions – Actions to execute. Returns: (Dict of) next state(s), boolean indicating terminal, and reward signal.
-
from_spec
(spec, kwargs)¶ Creates an environment from a specification dict.
-
reset
()¶ Reset environment and setup for new episode.
Returns: initial state of reset environment.
-
seed
(seed)¶ Sets the random seed of the environment to the given value (current time, if seed=None). Naturally deterministic Environments (e.g. ALE or some gym Envs) don’t have to implement this method.
Parameters: seed (int) – The seed to use for initializing the pseudo-random number generator (default=epoch time in sec). Returns: The actual seed (int) used OR None if Environment did not override this method (no seeding supported).
-
set_state
(**kwargs)¶ Sets the current state of the environment manually to some other state and returns a new observation.
Parameters: **kwargs – - The set instruction(s) to be executed by the environment.
- A single set instruction usually set a single property of the
state/observation vector to some new value.
Returns: The observation dictionary of the Environment after(!) setting it to the new state.
-
states
¶ Return the state space. Might include subdicts if multiple states are available simultaneously.
Returns: dict of state properties (shape and type).
-
tensorforce.contrib.unreal_engine module¶
-
class
tensorforce.contrib.unreal_engine.
UE4Environment
(host='localhost', port=6025, connect=True, discretize_actions=False, delta_time=0, num_ticks=4)¶ Bases:
tensorforce.contrib.remote_environment.RemoteEnvironment
,tensorforce.contrib.state_settable_environment.StateSettableEnvironment
A special RemoteEnvironment for UE4 game connections. Communicates with the remote to receive information on the definitions of action- and observation spaces. Sends UE4 Action- and Axis-mappings as RL-actions and receives observations back defined by MLObserver objects placed in the Game (these could be camera pixels or other observations, e.g. a x/y/z position of some game actor).
-
__init__
(host='localhost', port=6025, connect=True, discretize_actions=False, delta_time=0, num_ticks=4)¶ Parameters: - host (str) – The hostname to connect to.
- port (int) – The port to connect to.
- connect (bool) – Whether to connect already in this c’tor.
- discretize_actions (bool) – Whether to treat axis-mappings defined in UE4 game as discrete actions. This would be necessary e.g. for agents that use q-networks where the output are q-values per discrete state-action pair.
- delta_time (float) – The fake delta time to use for each single game tick.
- num_ticks (int) – The number of ticks to be executed in a single act call (each tick will repeat the same given actions).
-
actions
()¶
-
close
()¶ Same as disconnect method.
-
connect
(timeout=600)¶
-
current_state
¶
-
disconnect
()¶ Ends our server tcp connection.
-
discretize_action_space_desc
()¶ Creates a list of discrete action(-combinations) in case we want to learn with a discrete set of actions, but only have action-combinations (maybe even continuous) available from the env. E.g. the UE4 game has the following action/axis-mappings:
{ 'Fire': {'type': 'action', 'keys': ('SpaceBar',)}, 'MoveRight': {'type': 'axis', 'keys': (('Right', 1.0), ('Left', -1.0), ('A', -1.0), ('D', 1.0))}, }
-> this method will discretize them into the following 6 discrete actions:
[ [(Right, 0.0),(SpaceBar, False)], [(Right, 0.0),(SpaceBar, True)] [(Right, -1.0),(SpaceBar, False)], [(Right, -1.0),(SpaceBar, True)], [(Right, 1.0),(SpaceBar, False)], [(Right, 1.0),(SpaceBar, True)], ]
-
execute
(actions)¶ Executes a single step in the UE4 game. This step may be comprised of one or more actual game ticks for all of which the same given action- and axis-inputs (or action number in case of discretized actions) are repeated. UE4 distinguishes between action-mappings, which are boolean actions (e.g. jump or dont-jump) and axis-mappings, which are continuous actions like MoveForward with values between -1.0 (run backwards) and 1.0 (run forwards), 0.0 would mean: stop.
-
static
extract_observation
(message)¶
-
from_spec
(spec, kwargs)¶ Creates an environment from a specification dict.
-
reset
()¶ same as step (no kwargs to pass), but needs to block and return observation_dict
- stores the received observation in self.last_observation
-
seed
(seed=None)¶
-
set_state
(setters, **kwargs)¶
-
states
()¶
-
translate_abstract_actions_to_keys
(abstract)¶ Translates a list of tuples ([pretty mapping], [value]) to a list of tuples ([some key], [translated value]) each single item in abstract will undergo the following translation:
Example1: we want: “MoveRight”: 5.0 possible keys for the action are: (“Right”, 1.0), (“Left”, -1.0) result: “Right”: 5.0 * 1.0 = 5.0
Example2: we want: “MoveRight”: -0.5 possible keys for the action are: (“Left”, -1.0), (“Right”, 1.0) result: “Left”: -0.5 * -1.0 = 0.5 (same as “Right”: -0.5)
-