Module specification

Agents are instantiated via Agent.create(agent=...), with either of the specification alternatives presented below (agent acts as type argument). It is recommended to pass as second argument environment the application Environment implementation, which automatically extracts the corresponding states, actions and max_episode_timesteps arguments of the agent.

How to specify modules

Dictionary with module type and arguments

Agent.create(...
    policy=dict(network=dict(type='layered', layers=[dict(type='dense', size=32)])),
    memory=dict(type='replay', capacity=10000), ...
)

JSON specification file (plus additional arguments)

Agent.create(...
    policy=dict(network='network.json'),
    memory=dict(type='memory.json', capacity=10000), ...
)

Module path (plus additional arguments)

Agent.create(...
    policy=dict(network='my_module.TestNetwork'),
    memory=dict(type='tensorforce.core.memories.Replay', capacity=10000), ...
)

Callable or Type (plus additional arguments)

Agent.create(...
    policy=dict(network=TestNetwork),
    memory=dict(type=Replay, capacity=10000), ...
)

Default module: only arguments or first argument

Agent.create(...
    policy=dict(network=[dict(type='dense', size=32)]),
    memory=dict(capacity=10000), ...
)

Static vs dynamic hyperparameters

Tensorforce distinguishes between agent/module arguments (primitive types: bool/int/long/float) which specify either part of the TensorFlow model architecture, like the layer size, or a value within the architecture, like the learning rate. Whereas the former are statically defined as part of the agent initialization, the latter can be dynamically adjusted afterwards. These dynamic hyperparameters are indicated by parameter as part of their type specification in the documentation, and can alternatively be assigned a parameter module instead of a constant value, for instance, to specify a decaying learning rate.

Example: exponentially decaying exploration

Agent.create(...
    exploration=dict(
        type='decaying', unit='timesteps', decay='exponential',
        initial_value=0.1, decay_steps=1000, decay_rate=0.5
    ), ...
)

Example: linearly increasing horizon

Agent.create(...
    reward_estimation=dict(horizon=dict(
        type='decaying', dtype='long', unit='episodes', decay='polynomial',
        initial_value=10.0, decay_steps=1000, final_value=50.0, power=1.0
    ), ...
)