tensorforce.core.memories package¶

Submodules¶

tensorforce.core.memories.memory module¶

class tensorforce.core.memories.memory.Memory(states_spec, actions_spec)¶

Bases: object

Abstract memory class.

add_observation(states, internals, actions, terminal, reward)¶

Inserts a single experience to the memory.

Parameters:	states – internals – actions – terminal – reward –

Returns:

static from_spec(spec, kwargs=None)¶: Creates a memory from a specification dict.

get_batch(batch_size, next_states=False)¶

Samples a batch from the memory.

Parameters:	batch_size – The batch size next_states – A boolean flag indicating whether ‘next_states’ values should be included

Returns: A dict containing states, internal states, actions, terminals, rewards (and next states)

set_memory(states, internals, actions, terminals, rewards)¶

Deletes memory content and sets content to provided observations.

Parameters:	states – internals – actions – terminals – rewards –

update_batch(loss_per_instance)¶

Updates loss values for sampling strategies based on loss functions.

Parameters:	loss_per_instance –

tensorforce.core.memories.naive_prioritized_replay module¶

class tensorforce.core.memories.naive_prioritized_replay.NaivePrioritizedReplay(states_spec, actions_spec, capacity, prioritization_weight=1.0)¶

Bases: tensorforce.core.memories.memory.Memory

Prioritised replay sampling based on loss per experience.

add_observation(states, internals, actions, terminal, reward)¶

get_batch(batch_size, next_states=False)¶

Samples a batch of the specified size according to priority.

Parameters:	batch_size – The batch size next_states – A boolean flag indicating whether ‘next_states’ values should be included

Returns: A dict containing states, actions, rewards, terminals, internal states (and next states)

update_batch(loss_per_instance)¶

Computes priorities according to loss.

Parameters:	loss_per_instance –

tensorforce.core.memories.naive_prioritized_replay.random() → x in the interval [0, 1).¶

tensorforce.core.memories.prioritized_replay module¶

class tensorforce.core.memories.prioritized_replay.PrioritizedReplay(states_spec, actions_spec, capacity, prioritization_weight=1.0, prioritization_constant=0.0)¶

Bases: tensorforce.core.memories.memory.Memory

Prioritised replay sampling based on loss per experience.

add_observation(states, internals, actions, terminal, reward)¶

get_batch(batch_size, next_states=False)¶

Samples a batch of the specified size according to priority.

Parameters:	batch_size – The batch size next_states – A boolean flag indicating whether ‘next_states’ values should be included

Returns: A dict containing states, actions, rewards, terminals, internal states (and next states)

update_batch(loss_per_instance)¶

Computes priorities according to loss.

Parameters:	loss_per_instance –

class tensorforce.core.memories.prioritized_replay.SumTree(capacity)¶

Bases: object

Sum tree data structure where data is stored in leaves and each node on the tree contains a sum of the children.

Items and priorities are stored in leaf nodes, while internal nodes store the sum of priorities from all its descendants. Internally a single list stores the internal nodes followed by leaf nodes.

See:

Usage:: tree = SumTree(100) tree.push(‘item1’, priority=0.5) tree.push(‘item2’, priority=0.6) item, priority = tree[0] batch = tree.sample_minibatch(2)

move(external_index, new_priority)¶: Change the priority of a leaf node

put(item, priority=None)¶

Stores a transition in replay memory.

If the memory is full, the oldest entry is replaced.

sample_minibatch(batch_size)¶: Sample minibatch of size batch_size.

tensorforce.core.memories.replay module¶

class tensorforce.core.memories.replay.Replay(states_spec, actions_spec, capacity, random_sampling=True)¶

Bases: tensorforce.core.memories.memory.Memory

Replay memory to store observations and sample mini batches for training from.

add_observation(states, internals, actions, terminal, reward)¶

get_batch(batch_size, next_states=False, keep_terminal_states=True)¶

Samples a batch of the specified size by selecting a random start/end point and returning the contained sequence or random indices depending on the field ‘random_sampling’.

Parameters:

batch_size – The batch size
next_states – A boolean flag indicating whether ‘next_states’ values should be included
keep_terminal_states – A boolean flag indicating whether to keep terminal states when next_states are requested. In this case, the next state is not from the same episode and should probably not be used to learn a model of the environment. However, if the environment produces sparse rewards (i.e. only one reward at the end of the episode) we cannot exclude terminal states, as otherwise there would never be a reward to learn from.

Returns: A dict containing states, actions, rewards, terminals, internal states (and next states)

set_memory(states, internals, actions, terminal, reward)¶

Convenience function to set whole batches as memory content to bypass calling the insert function for every single experience.

Parameters:	states – internals – actions – terminal – reward –

Returns:

update_batch(loss_per_instance)¶

Module contents¶

class tensorforce.core.memories.Memory(states_spec, actions_spec)¶

Bases: object

Abstract memory class.

add_observation(states, internals, actions, terminal, reward)¶

Inserts a single experience to the memory.

Parameters:	states – internals – actions – terminal – reward –

Returns:

static from_spec(spec, kwargs=None)¶: Creates a memory from a specification dict.

get_batch(batch_size, next_states=False)¶

Samples a batch from the memory.

Parameters:	batch_size – The batch size next_states – A boolean flag indicating whether ‘next_states’ values should be included

Returns: A dict containing states, internal states, actions, terminals, rewards (and next states)

set_memory(states, internals, actions, terminals, rewards)¶

Deletes memory content and sets content to provided observations.

Parameters:	states – internals – actions – terminals – rewards –

update_batch(loss_per_instance)¶

Updates loss values for sampling strategies based on loss functions.

Parameters:	loss_per_instance –

class tensorforce.core.memories.Replay(states_spec, actions_spec, capacity, random_sampling=True)¶

Bases: tensorforce.core.memories.memory.Memory

Replay memory to store observations and sample mini batches for training from.

add_observation(states, internals, actions, terminal, reward)¶

get_batch(batch_size, next_states=False, keep_terminal_states=True)¶

Samples a batch of the specified size by selecting a random start/end point and returning the contained sequence or random indices depending on the field ‘random_sampling’.

Parameters:

batch_size – The batch size
next_states – A boolean flag indicating whether ‘next_states’ values should be included
keep_terminal_states – A boolean flag indicating whether to keep terminal states when next_states are requested. In this case, the next state is not from the same episode and should probably not be used to learn a model of the environment. However, if the environment produces sparse rewards (i.e. only one reward at the end of the episode) we cannot exclude terminal states, as otherwise there would never be a reward to learn from.

Returns: A dict containing states, actions, rewards, terminals, internal states (and next states)

set_memory(states, internals, actions, terminal, reward)¶

Convenience function to set whole batches as memory content to bypass calling the insert function for every single experience.

Parameters:	states – internals – actions – terminal – reward –

Returns:

update_batch(loss_per_instance)¶

class tensorforce.core.memories.PrioritizedReplay(states_spec, actions_spec, capacity, prioritization_weight=1.0, prioritization_constant=0.0)¶

Bases: tensorforce.core.memories.memory.Memory

Prioritised replay sampling based on loss per experience.

add_observation(states, internals, actions, terminal, reward)¶

get_batch(batch_size, next_states=False)¶

Samples a batch of the specified size according to priority.

Parameters:	batch_size – The batch size next_states – A boolean flag indicating whether ‘next_states’ values should be included

Returns: A dict containing states, actions, rewards, terminals, internal states (and next states)

update_batch(loss_per_instance)¶

Computes priorities according to loss.

Parameters:	loss_per_instance –

class tensorforce.core.memories.NaivePrioritizedReplay(states_spec, actions_spec, capacity, prioritization_weight=1.0)¶

Bases: tensorforce.core.memories.memory.Memory

Prioritised replay sampling based on loss per experience.

add_observation(states, internals, actions, terminal, reward)¶

get_batch(batch_size, next_states=False)¶

Samples a batch of the specified size according to priority.

Parameters:	batch_size – The batch size next_states – A boolean flag indicating whether ‘next_states’ values should be included

Returns: A dict containing states, actions, rewards, terminals, internal states (and next states)

update_batch(loss_per_instance)¶

Computes priorities according to loss.

Parameters:	loss_per_instance –