tensorforce.core.memories package

Submodules

tensorforce.core.memories.memory module

class tensorforce.core.memories.memory.Memory(states_spec, actions_spec)

Bases: object

Abstract memory class.

add_observation(states, internals, actions, terminal, reward)

Inserts a single experience to the memory.

Parameters:
  • states
  • internals
  • actions
  • terminal
  • reward

Returns:

static from_spec(spec, kwargs=None)

Creates a memory from a specification dict.

get_batch(batch_size, next_states=False)

Samples a batch from the memory.

Parameters:
  • batch_size – The batch size
  • next_states – A boolean flag indicating whether ‘next_states’ values should be included

Returns: A dict containing states, internal states, actions, terminals, rewards (and next states)

set_memory(states, internals, actions, terminals, rewards)

Deletes memory content and sets content to provided observations.

Parameters:
  • states
  • internals
  • actions
  • terminals
  • rewards
update_batch(loss_per_instance)

Updates loss values for sampling strategies based on loss functions.

Parameters:loss_per_instance

tensorforce.core.memories.naive_prioritized_replay module

class tensorforce.core.memories.naive_prioritized_replay.NaivePrioritizedReplay(states_spec, actions_spec, capacity, prioritization_weight=1.0)

Bases: tensorforce.core.memories.memory.Memory

Prioritised replay sampling based on loss per experience.

add_observation(states, internals, actions, terminal, reward)
get_batch(batch_size, next_states=False)

Samples a batch of the specified size according to priority.

Parameters:
  • batch_size – The batch size
  • next_states – A boolean flag indicating whether ‘next_states’ values should be included

Returns: A dict containing states, actions, rewards, terminals, internal states (and next states)

update_batch(loss_per_instance)

Computes priorities according to loss.

Parameters:loss_per_instance
tensorforce.core.memories.naive_prioritized_replay.random() → x in the interval [0, 1).

tensorforce.core.memories.prioritized_replay module

class tensorforce.core.memories.prioritized_replay.PrioritizedReplay(states_spec, actions_spec, capacity, prioritization_weight=1.0, prioritization_constant=0.0)

Bases: tensorforce.core.memories.memory.Memory

Prioritised replay sampling based on loss per experience.

add_observation(states, internals, actions, terminal, reward)
get_batch(batch_size, next_states=False)

Samples a batch of the specified size according to priority.

Parameters:
  • batch_size – The batch size
  • next_states – A boolean flag indicating whether ‘next_states’ values should be included

Returns: A dict containing states, actions, rewards, terminals, internal states (and next states)

update_batch(loss_per_instance)

Computes priorities according to loss.

Parameters:loss_per_instance
class tensorforce.core.memories.prioritized_replay.SumTree(capacity)

Bases: object

Sum tree data structure where data is stored in leaves and each node on the tree contains a sum of the children.

Items and priorities are stored in leaf nodes, while internal nodes store the sum of priorities from all its descendants. Internally a single list stores the internal nodes followed by leaf nodes.

See:

Usage:
tree = SumTree(100) tree.push(‘item1’, priority=0.5) tree.push(‘item2’, priority=0.6) item, priority = tree[0] batch = tree.sample_minibatch(2)
move(external_index, new_priority)

Change the priority of a leaf node

put(item, priority=None)

Stores a transition in replay memory.

If the memory is full, the oldest entry is replaced.

sample_minibatch(batch_size)

Sample minibatch of size batch_size.

tensorforce.core.memories.replay module

class tensorforce.core.memories.replay.Replay(states_spec, actions_spec, capacity, random_sampling=True)

Bases: tensorforce.core.memories.memory.Memory

Replay memory to store observations and sample mini batches for training from.

add_observation(states, internals, actions, terminal, reward)
get_batch(batch_size, next_states=False, keep_terminal_states=True)

Samples a batch of the specified size by selecting a random start/end point and returning the contained sequence or random indices depending on the field ‘random_sampling’.

Parameters:
  • batch_size – The batch size
  • next_states – A boolean flag indicating whether ‘next_states’ values should be included
  • keep_terminal_states – A boolean flag indicating whether to keep terminal states when next_states are requested. In this case, the next state is not from the same episode and should probably not be used to learn a model of the environment. However, if the environment produces sparse rewards (i.e. only one reward at the end of the episode) we cannot exclude terminal states, as otherwise there would never be a reward to learn from.

Returns: A dict containing states, actions, rewards, terminals, internal states (and next states)

set_memory(states, internals, actions, terminal, reward)

Convenience function to set whole batches as memory content to bypass calling the insert function for every single experience.

Parameters:
  • states
  • internals
  • actions
  • terminal
  • reward

Returns:

update_batch(loss_per_instance)

Module contents

class tensorforce.core.memories.Memory(states_spec, actions_spec)

Bases: object

Abstract memory class.

add_observation(states, internals, actions, terminal, reward)

Inserts a single experience to the memory.

Parameters:
  • states
  • internals
  • actions
  • terminal
  • reward

Returns:

static from_spec(spec, kwargs=None)

Creates a memory from a specification dict.

get_batch(batch_size, next_states=False)

Samples a batch from the memory.

Parameters:
  • batch_size – The batch size
  • next_states – A boolean flag indicating whether ‘next_states’ values should be included

Returns: A dict containing states, internal states, actions, terminals, rewards (and next states)

set_memory(states, internals, actions, terminals, rewards)

Deletes memory content and sets content to provided observations.

Parameters:
  • states
  • internals
  • actions
  • terminals
  • rewards
update_batch(loss_per_instance)

Updates loss values for sampling strategies based on loss functions.

Parameters:loss_per_instance
class tensorforce.core.memories.Replay(states_spec, actions_spec, capacity, random_sampling=True)

Bases: tensorforce.core.memories.memory.Memory

Replay memory to store observations and sample mini batches for training from.

add_observation(states, internals, actions, terminal, reward)
get_batch(batch_size, next_states=False, keep_terminal_states=True)

Samples a batch of the specified size by selecting a random start/end point and returning the contained sequence or random indices depending on the field ‘random_sampling’.

Parameters:
  • batch_size – The batch size
  • next_states – A boolean flag indicating whether ‘next_states’ values should be included
  • keep_terminal_states – A boolean flag indicating whether to keep terminal states when next_states are requested. In this case, the next state is not from the same episode and should probably not be used to learn a model of the environment. However, if the environment produces sparse rewards (i.e. only one reward at the end of the episode) we cannot exclude terminal states, as otherwise there would never be a reward to learn from.

Returns: A dict containing states, actions, rewards, terminals, internal states (and next states)

set_memory(states, internals, actions, terminal, reward)

Convenience function to set whole batches as memory content to bypass calling the insert function for every single experience.

Parameters:
  • states
  • internals
  • actions
  • terminal
  • reward

Returns:

update_batch(loss_per_instance)
class tensorforce.core.memories.PrioritizedReplay(states_spec, actions_spec, capacity, prioritization_weight=1.0, prioritization_constant=0.0)

Bases: tensorforce.core.memories.memory.Memory

Prioritised replay sampling based on loss per experience.

add_observation(states, internals, actions, terminal, reward)
get_batch(batch_size, next_states=False)

Samples a batch of the specified size according to priority.

Parameters:
  • batch_size – The batch size
  • next_states – A boolean flag indicating whether ‘next_states’ values should be included

Returns: A dict containing states, actions, rewards, terminals, internal states (and next states)

update_batch(loss_per_instance)

Computes priorities according to loss.

Parameters:loss_per_instance
class tensorforce.core.memories.NaivePrioritizedReplay(states_spec, actions_spec, capacity, prioritization_weight=1.0)

Bases: tensorforce.core.memories.memory.Memory

Prioritised replay sampling based on loss per experience.

add_observation(states, internals, actions, terminal, reward)
get_batch(batch_size, next_states=False)

Samples a batch of the specified size according to priority.

Parameters:
  • batch_size – The batch size
  • next_states – A boolean flag indicating whether ‘next_states’ values should be included

Returns: A dict containing states, actions, rewards, terminals, internal states (and next states)

update_batch(loss_per_instance)

Computes priorities according to loss.

Parameters:loss_per_instance