
Multi-input and non-sequential network architectures

Abort-terminal due to timestep limit

Besides terminal=False or =0 for non-terminal and terminal=True or =1 for true terminal, Tensorforce recognizes terminal=2 as abort-terminal and handles it accordingly for reward estimation. Environments created via Environment.create(..., max_episode_timesteps=?, ...) will automatically return the appropriate terminal depending on whether an episode truly terminates or is aborted because it reached the time limit.

Action masking

agent = Agent.create(
    states=dict(type='float', shape=(10,)),
    actions=dict(type='int', shape=(), num_values=3),
states = dict(
    state=np.random.random_sample(size=(10,)),  # state (default name: "state")
    action_mask=[True, False, True]  # mask as'[ACTION-NAME]_mask' (default name: "action")
action = agent.act(states=states)
assert action != 1

Parallel environment execution

See also the parallelization example for details on how to use this feature.

Execute multiple environments running locally in one call / batched:

    agent='benchmarks/configs/ppo1.json', environment='CartPole-v1',
runner.run(num_episodes=100, batch_agent_calls=True)

Execute environments running in different processes whenever ready / unbatched:

    agent='benchmarks/configs/ppo1.json', environment='CartPole-v1',
    num_parallel=4, remote='multiprocessing'

Execute environments running on different machines, here using run.py instead of Runner:

# Environment machine 1
python run.py --environment gym --level CartPole-v1 --remote socket-server \
    --port 65432

# Environment machine 2
python run.py --environment gym --level CartPole-v1 --remote socket-server \
    --port 65433

# Agent machine
python run.py --agent benchmarks/configs/ppo1.json --episodes 100 \
    --num-parallel 2 --remote socket-client --host, \
    --port 65432,65433 --batch-agent-calls

Save & restore

TensorFlow saver (full model)

agent = Agent.create(...
        frequency=100  # save checkpoint every 100 updates
    ), ...

# Restore latest agent checkpoint
agent = Agent.load(directory='data/checkpoints')

NumPy / HDF5 (only weights)

agent = Agent.create(...)
agent.save(directory='data/checkpoints', format='numpy', append='episodes')

# Restore latest agent checkpoint
agent = Agent.load(directory='data/checkpoints', format='numpy')

SavedModel export

See the SavedModel example for details on how to use this feature.


        # list of labels, or 'all'
        labels=['entropy', 'kl-divergence', 'loss', 'reward', 'update-norm']
    ), ...

Act-experience-update interaction

Instead of the default act-observe interaction pattern or the Runner utility, one can alternatively use the act-experience-update interface, which allows for more control over the experience the agent stores. See the act-experience-update example for details on how to use this feature. Note that a few stateful network layers will not be updated correctly in independent-mode (currently, exponential_normalization).

Record & pretrain

See the record-and-pretrain example for details on how to use this feature.