Getting started¶
Initializing an environment¶
It is recommended to initialize an environment via the Environment.create(...)
interface.
from tensorforce.environments import Environment
For instance, the OpenAI CartPole environment can be initialized as follows:
environment = Environment.create(
environment='gym', level='CartPole', max_episode_timesteps=500
)
Gym’s pre-defined versions are also accessible:
environment = Environment.create(environment='gym', level='CartPole-v1')
Alternatively, an environment can be specified as a config file:
{
"environment": "gym",
"level": "CartPole"
}
Environment config files can be loaded by passing their file path:
environment = Environment.create(
environment='environment.json', max_episode_timesteps=500
)
Custom Gym environments can be used in the same way, but require the corresponding class(es) to be imported and registered accordingly.
Finally, it is possible to implement a custom environment using Tensorforce’s Environment
interface:
class CustomEnvironment(Environment):
def __init__(self):
super().__init__()
def states(self):
return dict(type='float', shape=(8,))
def actions(self):
return dict(type='int', num_values=4)
# Optional, should only be defined if environment has a natural maximum
# episode length
def max_episode_timesteps(self):
return super().max_episode_timesteps()
# Optional
def close(self):
super().close()
def reset(self):
state = np.random.random(size=(8,))
return state
def execute(self, actions):
assert 0 <= actions.item() <= 3
next_state = np.random.random(size=(8,))
terminal = np.random.random() < 0.5
reward = np.random.random()
return next_state, terminal, reward
Custom environment implementations can be loaded by passing their module path:
environment = Environment.create(
environment='custom_env.CustomEnvironment', max_episode_timesteps=10
)
It is strongly recommended to specify the max_episode_timesteps
argument of Environment.create(...)
unless specified by the environment (or for evaluation), as otherwise more agent parameters may require specification.
Initializing an agent¶
Similarly to environments, it is recommended to initialize an agent via the Agent.create(...)
interface.
from tensorforce.agents import Agent
For instance, the generic Tensorforce agent can be initialized as follows:
agent = Agent.create(
agent='tensorforce', environment=environment, update=64,
objective='policy_gradient', reward_estimation=dict(horizon=20)
)
Other pre-defined agent classes can alternatively be used, for instance, Proximal Policy Optimization:
agent = Agent.create(
agent='ppo', environment=environment, batch_size=10, learning_rate=1e-3
)
Alternatively, an agent can be specified as a config file:
{
"agent": "tensorforce",
"update": 64,
"objective": "policy_gradient",
"reward_estimation": {
"horizon": 20
}
}
Agent config files can be loaded by passing their file path:
agent = Agent.create(agent='agent.json', environment=environment)
It is recommended to pass the environment object returned by Environment.create(...)
as environment
argument of Agent.create(...)
, so that the states
, actions
and max_episode_timesteps
argument are automatically specified accordingly.
Training and evaluation¶
It is recommended to use the execution utilities for training and evaluation, like the Runner utility, which offer a range of configuration options:
from tensorforce.execution import Runner
A basic experiment consisting of training and subsequent evaluation can be written in a few lines of code:
runner = Runner(
agent='agent.json',
environment=dict(environment='gym', level='CartPole'),
max_episode_timesteps=500
)
runner.run(num_episodes=200)
runner.run(num_episodes=100, evaluation=True)
runner.close()
The execution utility classes take care of handling the agent-environment interaction correctly, and thus should be used where possible. Alternatively, if more detailed control over the agent-environment interaction is required, a simple training and evaluation loop can be written as follows:
# Create agent and environment
environment = Environment.create(
environment='environment.json', max_episode_timesteps=500
)
agent = Agent.create(agent='agent.json', environment=environment)
# Train for 200 episodes
for _ in range(200):
states = environment.reset()
terminal = False
while not terminal:
actions = agent.act(states=states)
states, terminal, reward = environment.execute(actions=actions)
agent.observe(terminal=terminal, reward=reward)
# Evaluate for 100 episodes
sum_rewards = 0.0
for _ in range(100):
states = environment.reset()
terminal = False
while not terminal:
actions = agent.act(states=states, evaluation=True)
states, terminal, reward = environment.execute(actions=actions)
sum_rewards += reward
print('Mean episode reward:', sum_rewards / 100)
# Close agent and environment
agent.close()
environment.close()