Agent specification

Agents are instantiated via Agent.create(agent=...), with either of the specification alternatives presented below (agent acts as type argument). It is recommended to pass as second argument environment the application Environment implementation, which automatically extracts the corresponding states, actions and max_episode_timesteps arguments of the agent.

States and actions specification

A state/action value is specified as dictionary with mandatory attributes type (one of 'bool': binary, 'int': discrete, or 'float': continuous) and shape (a positive number or tuple thereof). Moreover, 'int' values should additionally specify num_values (the fixed number of discrete options), whereas 'float' values can specify bounds via min/max_value. If the state or action consists of multiple components, these are specified via an additional dictionary layer. The following example illustrates both possibilities:

states = dict(
    observation=dict(type='float', shape=(16, 16, 3)),
    attributes=dict(type='int', shape=(4, 2), num_values=5)
)
actions = dict(type='float', shape=10)

Note: Ideally, the agent arguments states and actions are specified implicitly by passing the environment argument.

How to specify modules

Dictionary with module type and arguments

Agent.create(...
    policy=dict(network=dict(type='layered', layers=[dict(type='dense', size=32)])),
    memory=dict(type='replay', capacity=10000), ...
)

JSON specification file (plus additional arguments)

Agent.create(...
    policy=dict(network='network.json'),
    memory=dict(type='memory.json', capacity=10000), ...
)

Module path (plus additional arguments)

Agent.create(...
    policy=dict(network='my_module.TestNetwork'),
    memory=dict(type='tensorforce.core.memories.Replay', capacity=10000), ...
)

Callable or Type (plus additional arguments)

Agent.create(...
    policy=dict(network=TestNetwork),
    memory=dict(type=Replay, capacity=10000), ...
)

Default module: only arguments or first argument

Agent.create(...
    policy=dict(network=[dict(type='dense', size=32)]),
    memory=dict(capacity=10000), ...
)