Agent and model overview¶
A reinforcement learning agent provides methods to process states and
return actions, to store past observations, and to load and save models.
Most agents employ a Model
which implements the algorithms to
calculate the next action given the current state and to update model
parameters from past experiences.
Environment <-> Runner <-> Agent <-> Model
Parameters to the agent are passed as a Configuration
object. The
configuration is passed on to the Model
.
Ready-to-use algorithms¶
We implemented some of the most common RL algorithms and try to keep these up-to-date. Here we provide an overview over all implemented agents and models.
Agent / General parameters¶
Agent
is the base class for all reinforcement learning agents. Every
agent inherits from this class.
MemoryAgent¶
BatchAgent¶
Deep-Q-Networks (DQN)¶
Normalized Advantage Functions¶
Deep-Q-learning from demonostration (DQFD)¶
Vanilla Policy Gradient¶
Trust Region Policy Optimization (TRPO)¶
State preprocessing¶
The agent handles state preprocessing. A preprocessor takes the raw state input from the environment and modifies it (for instance, image resize, state concatenation, etc.). You can find information about our ready-to-use preprocessors here.
Building your own agent¶
If you want to build your own agent, it should always inherit from
Agent
. If your agent uses a replay memory, it should probably inherit
from MemoryAgent
, if it uses a batch replay that is emptied after each update,
it should probably inherit from BatchAgent
.
We distinguish between agents and models. The Agent
class handles the
interaction with the environment, such as state preprocessing, exploration
and observation of rewards. The Model
class handles the mathematical
operations, such as building the tensorflow operations, calculating the
desired action and updating (i.e. optimizing) the model weights.
To start building your own agent, please refer to this blogpost to gain a deeper understanding of the internals of the TensorForce library. Afterwards, have look on a sample implementation, e.g. the DQN Agent and DQN Model.