Getting started

Initializing an environment

It is recommended to initialize an environment via the Environment.create(...) interface.

from tensorforce.environments import Environment

For instance, the OpenAI CartPole environment can be initialized as follows:

environment = Environment.create(
    environment='gym', level='CartPole', max_episode_timesteps=500
)

Gym’s pre-defined versions are also accessible:

environment = Environment.create(environment='gym', level='CartPole-v1')

Alternatively, an environment can be specified as a config file:

{
    "environment": "gym",
    "level": "CartPole"
}

Environment config files can be loaded by passing their file path:

environment = Environment.create(
    environment='environment.json', max_episode_timesteps=500
)

Custom Gym environments can be used in the same way, but require the corresponding class(es) to be imported and registered accordingly.

Finally, it is possible to implement a custom environment using Tensorforce’s Environment interface:

class CustomEnvironment(Environment):

    def __init__(self):
        super().__init__()

    def states(self):
        return dict(type='float', shape=(8,))

    def actions(self):
        return dict(type='int', num_values=4)

    # Optional, should only be defined if environment has a natural maximum
    # episode length
    def max_episode_timesteps(self):
        return super().max_episode_timesteps()

    # Optional
    def close(self):
        super().close()

    def reset(self):
        state = np.random.random(size=(8,))
        return state

    def execute(self, actions):
        assert 0 <= actions.item() <= 3
        next_state = np.random.random(size=(8,))
        terminal = np.random.random() < 0.5
        reward = np.random.random()
        return next_state, terminal, reward

Custom environment implementations can be loaded by passing their module path:

environment = Environment.create(
    environment='custom_env.CustomEnvironment', max_episode_timesteps=10
)

It is strongly recommended to specify the max_episode_timesteps argument of Environment.create(...) unless specified by the environment (or for evaluation), as otherwise more agent parameters may require specification.

Initializing an agent

Similarly to environments, it is recommended to initialize an agent via the Agent.create(...) interface.

from tensorforce.agents import Agent

For instance, the generic Tensorforce agent can be initialized as follows:

agent = Agent.create(
    agent='tensorforce', environment=environment, update=64,
    objective='policy_gradient', reward_estimation=dict(horizon=20)
)

Other pre-defined agent classes can alternatively be used, for instance, Proximal Policy Optimization:

agent = Agent.create(
    agent='ppo', environment=environment, batch_size=10, learning_rate=1e-3
)

Alternatively, an agent can be specified as a config file:

{
    "agent": "tensorforce",
    "update": 64,
    "objective": "policy_gradient",
    "reward_estimation": {
        "horizon": 20
    }
}

Agent config files can be loaded by passing their file path:

agent = Agent.create(agent='agent.json', environment=environment)

It is recommended to pass the environment object returned by Environment.create(...) as environment argument of Agent.create(...), so that the states, actions and max_episode_timesteps argument are automatically specified accordingly.

Training and evaluation

It is recommended to use the execution utilities for training and evaluation, like the Runner utility, which offer a range of configuration options:

from tensorforce.execution import Runner

A basic experiment consisting of training and subsequent evaluation can be written in a few lines of code:

runner = Runner(
    agent='agent.json',
    environment=dict(environment='gym', level='CartPole'),
    max_episode_timesteps=500
)

runner.run(num_episodes=200)

runner.run(num_episodes=100, evaluation=True)

runner.close()

The execution utility classes take care of handling the agent-environment interaction correctly, and thus should be used where possible. Alternatively, if more detailed control over the agent-environment interaction is required, a simple training and evaluation loop can be written as follows:

# Create agent and environment
environment = Environment.create(
    environment='environment.json', max_episode_timesteps=500
)
agent = Agent.create(agent='agent.json', environment=environment)

# Train for 200 episodes
for _ in range(200):
    states = environment.reset()
    terminal = False
    while not terminal:
        actions = agent.act(states=states)
        states, terminal, reward = environment.execute(actions=actions)
        agent.observe(terminal=terminal, reward=reward)

# Evaluate for 100 episodes
sum_rewards = 0.0
for _ in range(100):
    states = environment.reset()
    terminal = False
    while not terminal:
        actions = agent.act(states=states, evaluation=True)
        states, terminal, reward = environment.execute(actions=actions)
        sum_rewards += reward

print('Mean episode reward:', sum_rewards / 100)

# Close agent and environment
agent.close()
environment.close()