General environment interface¶
Initialization and termination¶
-
static
Environment.
create
(environment=None, max_episode_timesteps=None, reward_shaping=None, remote=None, blocking=False, host=None, port=None, **kwargs)¶ Creates an environment from a specification. In case of “socket-server” remote mode, runs environment in server communication loop until closed.
Parameters: - environment (specification | Environment class/object) – JSON file, specification key,
configuration dictionary, library module,
Environment
class/object, or gym.Env (required, invalid for “socket-client” remote mode). - max_episode_timesteps (int > 0) – Maximum number of timesteps per episode, overwrites the environment default if defined (default: environment default, invalid for “socket-client” remote mode).
- reward_shaping (callable[(s,a,t,r,s') -> r|(r,t)] | str) – Reward shaping function mapping state, action, terminal, reward and next state to shaped reward and terminal, or a string expression with arguments “states”, “actions”, “terminal”, “reward” and “next_states”, e.g. “-1.0 if terminal else max(reward, 0.0)” (default: no reward shaping).
- remote ("multiprocessing" | "socket-client" | "socket-server") – Communication mode for remote environment execution of parallelized environment execution, “socket-client” mode requires a corresponding “socket-server” running, and “socket-server” mode runs environment in server communication loop until closed (default: local execution).
- blocking (bool) – Whether remote environment calls should be blocking (default: not blocking, invalid unless “multiprocessing” or “socket-client” remote mode).
- host (str) – Socket server hostname or IP address (required only for “socket-client” remote mode).
- port (int) – Socket server port (required only for “socket-client/server” remote mode).
- kwargs – Additional arguments.
- environment (specification | Environment class/object) – JSON file, specification key,
configuration dictionary, library module,
-
Environment.
close
()¶ Closes the environment.
Properties¶
-
Environment.
states
()¶ Returns the state space specification.
Returns: Arbitrarily nested dictionary of state descriptions with the following attributes: - type ("bool" | "int" | "float") – state data type (default: "float").
- shape (int | iter[int]) – state shape (required).
- num_states (int > 0) – number of discrete state values (required for type "int").
- min_value/max_value (float) – minimum/maximum state value (optional for type "float").
Return type: specification
-
Environment.
actions
()¶ Returns the action space specification.
Returns: Arbitrarily nested dictionary of action descriptions with the following attributes: - type ("bool" | "int" | "float") – action data type (required).
- shape (int > 0 | iter[int > 0]) – action shape (default: scalar).
- num_actions (int > 0) – number of discrete action values (required for type "int").
- min_value/max_value (float) – minimum/maximum action value (optional for type "float").
Return type: specification
-
Environment.
max_episode_timesteps
()¶ Returns the maximum number of timesteps per episode.
Returns: Maximum number of timesteps per episode. Return type: int
Interaction functions¶
-
Environment.
reset
(num_parallel=None)¶ Resets the environment to start a new episode.
Parameters: num_parallel (int >= 1) – Number of environment instances executed in parallel, only valid if environment is vectorizable (no vectorization). Returns: Dictionary containing initial state(s) and auxiliary information, and parallel index vector in case of vectorized execution. Return type: (parallel,) dict[state]
-
Environment.
execute
(actions)¶ Executes the given action(s) and advances the environment by one step.
Parameters: actions (dict[action]) – Dictionary containing action(s) to be executed (required). Returns: Dictionary containing next state(s) and auxiliary information, whether a terminal state is reached or 2 if the episode was aborted, observed reward, and parallel index vector in case of vectorized execution. Return type: (parallel,) dict[state], bool | 0 | 1 | 2, float