tensorforce.core.memories package¶
Submodules¶
tensorforce.core.memories.memory module¶
-
class
tensorforce.core.memories.memory.
Memory
(states_spec, actions_spec)¶ Bases:
object
Abstract memory class.
-
add_observation
(states, internals, actions, terminal, reward)¶ Inserts a single experience to the memory.
Parameters: - states –
- internals –
- actions –
- terminal –
- reward –
Returns:
-
static
from_spec
(spec, kwargs=None)¶ Creates a memory from a specification dict.
-
get_batch
(batch_size, next_states=False)¶ Samples a batch from the memory.
Parameters: - batch_size – The batch size
- next_states – A boolean flag indicating whether ‘next_states’ values should be included
Returns: A dict containing states, internal states, actions, terminals, rewards (and next states)
-
set_memory
(states, internals, actions, terminals, rewards)¶ Deletes memory content and sets content to provided observations.
Parameters: - states –
- internals –
- actions –
- terminals –
- rewards –
-
update_batch
(loss_per_instance)¶ Updates loss values for sampling strategies based on loss functions.
Parameters: loss_per_instance –
-
tensorforce.core.memories.naive_prioritized_replay module¶
-
class
tensorforce.core.memories.naive_prioritized_replay.
NaivePrioritizedReplay
(states_spec, actions_spec, capacity, prioritization_weight=1.0)¶ Bases:
tensorforce.core.memories.memory.Memory
Prioritised replay sampling based on loss per experience.
-
add_observation
(states, internals, actions, terminal, reward)¶
-
get_batch
(batch_size, next_states=False)¶ Samples a batch of the specified size according to priority.
Parameters: - batch_size – The batch size
- next_states – A boolean flag indicating whether ‘next_states’ values should be included
Returns: A dict containing states, actions, rewards, terminals, internal states (and next states)
-
update_batch
(loss_per_instance)¶ Computes priorities according to loss.
Parameters: loss_per_instance –
-
-
tensorforce.core.memories.naive_prioritized_replay.
random
() → x in the interval [0, 1).¶
tensorforce.core.memories.prioritized_replay module¶
-
class
tensorforce.core.memories.prioritized_replay.
PrioritizedReplay
(states_spec, actions_spec, capacity, prioritization_weight=1.0, prioritization_constant=0.0)¶ Bases:
tensorforce.core.memories.memory.Memory
Prioritised replay sampling based on loss per experience.
-
add_observation
(states, internals, actions, terminal, reward)¶
-
get_batch
(batch_size, next_states=False)¶ Samples a batch of the specified size according to priority.
Parameters: - batch_size – The batch size
- next_states – A boolean flag indicating whether ‘next_states’ values should be included
Returns: A dict containing states, actions, rewards, terminals, internal states (and next states)
-
update_batch
(loss_per_instance)¶ Computes priorities according to loss.
Parameters: loss_per_instance –
-
-
class
tensorforce.core.memories.prioritized_replay.
SumTree
(capacity)¶ Bases:
object
Sum tree data structure where data is stored in leaves and each node on the tree contains a sum of the children.
Items and priorities are stored in leaf nodes, while internal nodes store the sum of priorities from all its descendants. Internally a single list stores the internal nodes followed by leaf nodes.
See:
- Usage:
- tree = SumTree(100) tree.push(‘item1’, priority=0.5) tree.push(‘item2’, priority=0.6) item, priority = tree[0] batch = tree.sample_minibatch(2)
-
move
(external_index, new_priority)¶ Change the priority of a leaf node
-
put
(item, priority=None)¶ Stores a transition in replay memory.
If the memory is full, the oldest entry is replaced.
-
sample_minibatch
(batch_size)¶ Sample minibatch of size batch_size.
tensorforce.core.memories.replay module¶
-
class
tensorforce.core.memories.replay.
Replay
(states_spec, actions_spec, capacity, random_sampling=True)¶ Bases:
tensorforce.core.memories.memory.Memory
Replay memory to store observations and sample mini batches for training from.
-
add_observation
(states, internals, actions, terminal, reward)¶
-
get_batch
(batch_size, next_states=False, keep_terminal_states=True)¶ Samples a batch of the specified size by selecting a random start/end point and returning the contained sequence or random indices depending on the field ‘random_sampling’.
Parameters: - batch_size – The batch size
- next_states – A boolean flag indicating whether ‘next_states’ values should be included
- keep_terminal_states – A boolean flag indicating whether to keep terminal states when
next_states
are requested. In this case, the next state is not from the same episode and should probably not be used to learn a model of the environment. However, if the environment produces sparse rewards (i.e. only one reward at the end of the episode) we cannot exclude terminal states, as otherwise there would never be a reward to learn from.
Returns: A dict containing states, actions, rewards, terminals, internal states (and next states)
-
set_memory
(states, internals, actions, terminal, reward)¶ Convenience function to set whole batches as memory content to bypass calling the insert function for every single experience.
Parameters: - states –
- internals –
- actions –
- terminal –
- reward –
Returns:
-
update_batch
(loss_per_instance)¶
-
Module contents¶
-
class
tensorforce.core.memories.
Memory
(states_spec, actions_spec)¶ Bases:
object
Abstract memory class.
-
add_observation
(states, internals, actions, terminal, reward)¶ Inserts a single experience to the memory.
Parameters: - states –
- internals –
- actions –
- terminal –
- reward –
Returns:
-
static
from_spec
(spec, kwargs=None)¶ Creates a memory from a specification dict.
-
get_batch
(batch_size, next_states=False)¶ Samples a batch from the memory.
Parameters: - batch_size – The batch size
- next_states – A boolean flag indicating whether ‘next_states’ values should be included
Returns: A dict containing states, internal states, actions, terminals, rewards (and next states)
-
set_memory
(states, internals, actions, terminals, rewards)¶ Deletes memory content and sets content to provided observations.
Parameters: - states –
- internals –
- actions –
- terminals –
- rewards –
-
update_batch
(loss_per_instance)¶ Updates loss values for sampling strategies based on loss functions.
Parameters: loss_per_instance –
-
-
class
tensorforce.core.memories.
Replay
(states_spec, actions_spec, capacity, random_sampling=True)¶ Bases:
tensorforce.core.memories.memory.Memory
Replay memory to store observations and sample mini batches for training from.
-
add_observation
(states, internals, actions, terminal, reward)¶
-
get_batch
(batch_size, next_states=False, keep_terminal_states=True)¶ Samples a batch of the specified size by selecting a random start/end point and returning the contained sequence or random indices depending on the field ‘random_sampling’.
Parameters: - batch_size – The batch size
- next_states – A boolean flag indicating whether ‘next_states’ values should be included
- keep_terminal_states – A boolean flag indicating whether to keep terminal states when
next_states
are requested. In this case, the next state is not from the same episode and should probably not be used to learn a model of the environment. However, if the environment produces sparse rewards (i.e. only one reward at the end of the episode) we cannot exclude terminal states, as otherwise there would never be a reward to learn from.
Returns: A dict containing states, actions, rewards, terminals, internal states (and next states)
-
set_memory
(states, internals, actions, terminal, reward)¶ Convenience function to set whole batches as memory content to bypass calling the insert function for every single experience.
Parameters: - states –
- internals –
- actions –
- terminal –
- reward –
Returns:
-
update_batch
(loss_per_instance)¶
-
-
class
tensorforce.core.memories.
PrioritizedReplay
(states_spec, actions_spec, capacity, prioritization_weight=1.0, prioritization_constant=0.0)¶ Bases:
tensorforce.core.memories.memory.Memory
Prioritised replay sampling based on loss per experience.
-
add_observation
(states, internals, actions, terminal, reward)¶
-
get_batch
(batch_size, next_states=False)¶ Samples a batch of the specified size according to priority.
Parameters: - batch_size – The batch size
- next_states – A boolean flag indicating whether ‘next_states’ values should be included
Returns: A dict containing states, actions, rewards, terminals, internal states (and next states)
-
update_batch
(loss_per_instance)¶ Computes priorities according to loss.
Parameters: loss_per_instance –
-
-
class
tensorforce.core.memories.
NaivePrioritizedReplay
(states_spec, actions_spec, capacity, prioritization_weight=1.0)¶ Bases:
tensorforce.core.memories.memory.Memory
Prioritised replay sampling based on loss per experience.
-
add_observation
(states, internals, actions, terminal, reward)¶
-
get_batch
(batch_size, next_states=False)¶ Samples a batch of the specified size according to priority.
Parameters: - batch_size – The batch size
- next_states – A boolean flag indicating whether ‘next_states’ values should be included
Returns: A dict containing states, actions, rewards, terminals, internal states (and next states)
-
update_batch
(loss_per_instance)¶ Computes priorities according to loss.
Parameters: loss_per_instance –
-