A reinforcement learning environment provides the API to a simulated or real environment as the subject for optimization. It could be anything from video games (e.g. Atari) to robots or trading systems. The agent interacts with this environment and learns to act optimally in its dynamics.
Environment <-> Runner <-> Agent <-> Model
Base environment class.
Return the action space. Might include subdicts if multiple actions are available simultaneously.
Returns: dict of action properties (continuous, number of actions)
Close environment. No other method calls possible afterwards.
Executes action, observes next state(s) and reward.
Parameters: actions – Actions to execute. Returns: (Dict of) next state(s), boolean indicating terminal, and reward signal.
Reset environment and setup for new episode.
Returns: initial state of resetted environment.
Return the state space. Might include subdicts if multiple states are available simultaneously.
Returns: dict of state properties (shape and type).