tensorforce.core.baselines package¶
Submodules¶
tensorforce.core.baselines.aggregated_baseline module¶
-
class
tensorforce.core.baselines.aggregated_baseline.
AggregatedBaseline
(baselines, scope='aggregated-baseline', summary_labels=())¶ Bases:
tensorforce.core.baselines.baseline.Baseline
Baseline which aggregates per-state baselines.
-
__init__
(baselines, scope='aggregated-baseline', summary_labels=())¶ Aggregated baseline.
Parameters: baselines – Dict of per-state baseline specification dicts
-
from_spec
(spec, kwargs=None)¶ Creates a baseline from a specification dict.
-
get_summaries
()¶
-
get_variables
(include_nontrainable=False)¶
-
tf_loss
(states, internals, reward, update, reference=None)¶ Creates the TensorFlow operations for calculating the L2 loss between predicted state values and actual rewards.
Parameters: - states – Dict of state tensors.
- internals – List of prior internal state tensors.
- reward – Reward tensor.
- update – Boolean tensor indicating whether this call happens during an update.
- reference – Optional reference tensor(s), in case of a comparative loss.
Returns: Loss tensor
-
tf_predict
(states, internals, update)¶
-
tf_reference
(states, internals, reward, update)¶ Creates the TensorFlow operations for obtaining the reference tensor(s), in case of a comparative loss.
Parameters: - states – Dict of state tensors.
- internals – List of prior internal state tensors.
- reward – Reward tensor.
- update – Boolean tensor indicating whether this call happens during an update.
Returns: Reference tensor(s).
-
tf_regularization_loss
()¶
-
tensorforce.core.baselines.baseline module¶
-
class
tensorforce.core.baselines.baseline.
Baseline
(scope='baseline', summary_labels=None)¶ Bases:
object
Base class for baseline value functions.
-
__init__
(scope='baseline', summary_labels=None)¶ Baseline.
-
static
from_spec
(spec, kwargs=None)¶ Creates a baseline from a specification dict.
-
get_summaries
()¶ Returns the TensorFlow summaries reported by the baseline
Returns: List of summaries
-
get_variables
(include_nontrainable=False)¶ Returns the TensorFlow variables used by the baseline.
Returns: List of variables
-
tf_loss
(states, internals, reward, update, reference=None)¶ Creates the TensorFlow operations for calculating the L2 loss between predicted state values and actual rewards.
Parameters: - states – Dict of state tensors.
- internals – List of prior internal state tensors.
- reward – Reward tensor.
- update – Boolean tensor indicating whether this call happens during an update.
- reference – Optional reference tensor(s), in case of a comparative loss.
Returns: Loss tensor
-
tf_predict
(states, internals, update)¶ Creates the TensorFlow operations for predicting the value function of given states. :param states: Dict of state tensors. :param internals: List of prior internal state tensors. :param update: Boolean tensor indicating whether this call happens during an update.
Returns: State value tensor
-
tf_reference
(states, internals, reward, update)¶ Creates the TensorFlow operations for obtaining the reference tensor(s), in case of a comparative loss.
Parameters: - states – Dict of state tensors.
- internals – List of prior internal state tensors.
- reward – Reward tensor.
- update – Boolean tensor indicating whether this call happens during an update.
Returns: Reference tensor(s).
-
tf_regularization_loss
()¶ Creates the TensorFlow operations for the baseline regularization loss/
Returns: Regularization loss tensor
-
tensorforce.core.baselines.cnn_baseline module¶
-
class
tensorforce.core.baselines.cnn_baseline.
CNNBaseline
(conv_sizes, dense_sizes, scope='cnn-baseline', summary_labels=())¶ Bases:
tensorforce.core.baselines.network_baseline.NetworkBaseline
CNN baseline (single-state) consisting of convolutional layers followed by dense layers.
-
__init__
(conv_sizes, dense_sizes, scope='cnn-baseline', summary_labels=())¶ CNN baseline.
Parameters: - conv_sizes – List of convolutional layer sizes
- dense_sizes – List of dense layer sizes
-
from_spec
(spec, kwargs=None)¶ Creates a baseline from a specification dict.
-
get_summaries
()¶
-
get_variables
(include_nontrainable=False)¶
-
tf_loss
(states, internals, reward, update, reference=None)¶ Creates the TensorFlow operations for calculating the L2 loss between predicted state values and actual rewards.
Parameters: - states – Dict of state tensors.
- internals – List of prior internal state tensors.
- reward – Reward tensor.
- update – Boolean tensor indicating whether this call happens during an update.
- reference – Optional reference tensor(s), in case of a comparative loss.
Returns: Loss tensor
-
tf_predict
(states, internals, update)¶
-
tf_reference
(states, internals, reward, update)¶ Creates the TensorFlow operations for obtaining the reference tensor(s), in case of a comparative loss.
Parameters: - states – Dict of state tensors.
- internals – List of prior internal state tensors.
- reward – Reward tensor.
- update – Boolean tensor indicating whether this call happens during an update.
Returns: Reference tensor(s).
-
tf_regularization_loss
()¶
-
tensorforce.core.baselines.mlp_baseline module¶
-
class
tensorforce.core.baselines.mlp_baseline.
MLPBaseline
(sizes, scope='mlp-baseline', summary_labels=())¶ Bases:
tensorforce.core.baselines.network_baseline.NetworkBaseline
Multi-layer perceptron baseline (single-state) consisting of dense layers.
-
__init__
(sizes, scope='mlp-baseline', summary_labels=())¶ Multi-layer perceptron baseline.
Parameters: sizes – List of dense layer sizes
-
from_spec
(spec, kwargs=None)¶ Creates a baseline from a specification dict.
-
get_summaries
()¶
-
get_variables
(include_nontrainable=False)¶
-
tf_loss
(states, internals, reward, update, reference=None)¶ Creates the TensorFlow operations for calculating the L2 loss between predicted state values and actual rewards.
Parameters: - states – Dict of state tensors.
- internals – List of prior internal state tensors.
- reward – Reward tensor.
- update – Boolean tensor indicating whether this call happens during an update.
- reference – Optional reference tensor(s), in case of a comparative loss.
Returns: Loss tensor
-
tf_predict
(states, internals, update)¶
-
tf_reference
(states, internals, reward, update)¶ Creates the TensorFlow operations for obtaining the reference tensor(s), in case of a comparative loss.
Parameters: - states – Dict of state tensors.
- internals – List of prior internal state tensors.
- reward – Reward tensor.
- update – Boolean tensor indicating whether this call happens during an update.
Returns: Reference tensor(s).
-
tf_regularization_loss
()¶
-
tensorforce.core.baselines.network_baseline module¶
-
class
tensorforce.core.baselines.network_baseline.
NetworkBaseline
(network, scope='network-baseline', summary_labels=())¶ Bases:
tensorforce.core.baselines.baseline.Baseline
Baseline based on a TensorForce network, used when parameters are shared between the value function and the baseline.
-
__init__
(network, scope='network-baseline', summary_labels=())¶ Network baseline.
Parameters: network_spec – Network specification dict
-
from_spec
(spec, kwargs=None)¶ Creates a baseline from a specification dict.
-
get_summaries
()¶
-
get_variables
(include_nontrainable=False)¶
-
tf_loss
(states, internals, reward, update, reference=None)¶ Creates the TensorFlow operations for calculating the L2 loss between predicted state values and actual rewards.
Parameters: - states – Dict of state tensors.
- internals – List of prior internal state tensors.
- reward – Reward tensor.
- update – Boolean tensor indicating whether this call happens during an update.
- reference – Optional reference tensor(s), in case of a comparative loss.
Returns: Loss tensor
-
tf_predict
(states, internals, update)¶
-
tf_reference
(states, internals, reward, update)¶ Creates the TensorFlow operations for obtaining the reference tensor(s), in case of a comparative loss.
Parameters: - states – Dict of state tensors.
- internals – List of prior internal state tensors.
- reward – Reward tensor.
- update – Boolean tensor indicating whether this call happens during an update.
Returns: Reference tensor(s).
-
tf_regularization_loss
()¶
-
Module contents¶
-
class
tensorforce.core.baselines.
Baseline
(scope='baseline', summary_labels=None)¶ Bases:
object
Base class for baseline value functions.
-
__init__
(scope='baseline', summary_labels=None)¶ Baseline.
-
static
from_spec
(spec, kwargs=None)¶ Creates a baseline from a specification dict.
-
get_summaries
()¶ Returns the TensorFlow summaries reported by the baseline
Returns: List of summaries
-
get_variables
(include_nontrainable=False)¶ Returns the TensorFlow variables used by the baseline.
Returns: List of variables
-
tf_loss
(states, internals, reward, update, reference=None)¶ Creates the TensorFlow operations for calculating the L2 loss between predicted state values and actual rewards.
Parameters: - states – Dict of state tensors.
- internals – List of prior internal state tensors.
- reward – Reward tensor.
- update – Boolean tensor indicating whether this call happens during an update.
- reference – Optional reference tensor(s), in case of a comparative loss.
Returns: Loss tensor
-
tf_predict
(states, internals, update)¶ Creates the TensorFlow operations for predicting the value function of given states. :param states: Dict of state tensors. :param internals: List of prior internal state tensors. :param update: Boolean tensor indicating whether this call happens during an update.
Returns: State value tensor
-
tf_reference
(states, internals, reward, update)¶ Creates the TensorFlow operations for obtaining the reference tensor(s), in case of a comparative loss.
Parameters: - states – Dict of state tensors.
- internals – List of prior internal state tensors.
- reward – Reward tensor.
- update – Boolean tensor indicating whether this call happens during an update.
Returns: Reference tensor(s).
-
tf_regularization_loss
()¶ Creates the TensorFlow operations for the baseline regularization loss/
Returns: Regularization loss tensor
-
-
class
tensorforce.core.baselines.
AggregatedBaseline
(baselines, scope='aggregated-baseline', summary_labels=())¶ Bases:
tensorforce.core.baselines.baseline.Baseline
Baseline which aggregates per-state baselines.
-
__init__
(baselines, scope='aggregated-baseline', summary_labels=())¶ Aggregated baseline.
Parameters: baselines – Dict of per-state baseline specification dicts
-
from_spec
(spec, kwargs=None)¶ Creates a baseline from a specification dict.
-
get_summaries
()¶
-
get_variables
(include_nontrainable=False)¶
-
tf_loss
(states, internals, reward, update, reference=None)¶ Creates the TensorFlow operations for calculating the L2 loss between predicted state values and actual rewards.
Parameters: - states – Dict of state tensors.
- internals – List of prior internal state tensors.
- reward – Reward tensor.
- update – Boolean tensor indicating whether this call happens during an update.
- reference – Optional reference tensor(s), in case of a comparative loss.
Returns: Loss tensor
-
tf_predict
(states, internals, update)¶
-
tf_reference
(states, internals, reward, update)¶ Creates the TensorFlow operations for obtaining the reference tensor(s), in case of a comparative loss.
Parameters: - states – Dict of state tensors.
- internals – List of prior internal state tensors.
- reward – Reward tensor.
- update – Boolean tensor indicating whether this call happens during an update.
Returns: Reference tensor(s).
-
tf_regularization_loss
()¶
-
-
class
tensorforce.core.baselines.
NetworkBaseline
(network, scope='network-baseline', summary_labels=())¶ Bases:
tensorforce.core.baselines.baseline.Baseline
Baseline based on a TensorForce network, used when parameters are shared between the value function and the baseline.
-
__init__
(network, scope='network-baseline', summary_labels=())¶ Network baseline.
Parameters: network_spec – Network specification dict
-
from_spec
(spec, kwargs=None)¶ Creates a baseline from a specification dict.
-
get_summaries
()¶
-
get_variables
(include_nontrainable=False)¶
-
tf_loss
(states, internals, reward, update, reference=None)¶ Creates the TensorFlow operations for calculating the L2 loss between predicted state values and actual rewards.
Parameters: - states – Dict of state tensors.
- internals – List of prior internal state tensors.
- reward – Reward tensor.
- update – Boolean tensor indicating whether this call happens during an update.
- reference – Optional reference tensor(s), in case of a comparative loss.
Returns: Loss tensor
-
tf_predict
(states, internals, update)¶
-
tf_reference
(states, internals, reward, update)¶ Creates the TensorFlow operations for obtaining the reference tensor(s), in case of a comparative loss.
Parameters: - states – Dict of state tensors.
- internals – List of prior internal state tensors.
- reward – Reward tensor.
- update – Boolean tensor indicating whether this call happens during an update.
Returns: Reference tensor(s).
-
tf_regularization_loss
()¶
-
-
class
tensorforce.core.baselines.
MLPBaseline
(sizes, scope='mlp-baseline', summary_labels=())¶ Bases:
tensorforce.core.baselines.network_baseline.NetworkBaseline
Multi-layer perceptron baseline (single-state) consisting of dense layers.
-
__init__
(sizes, scope='mlp-baseline', summary_labels=())¶ Multi-layer perceptron baseline.
Parameters: sizes – List of dense layer sizes
-
from_spec
(spec, kwargs=None)¶ Creates a baseline from a specification dict.
-
get_summaries
()¶
-
get_variables
(include_nontrainable=False)¶
-
tf_loss
(states, internals, reward, update, reference=None)¶ Creates the TensorFlow operations for calculating the L2 loss between predicted state values and actual rewards.
Parameters: - states – Dict of state tensors.
- internals – List of prior internal state tensors.
- reward – Reward tensor.
- update – Boolean tensor indicating whether this call happens during an update.
- reference – Optional reference tensor(s), in case of a comparative loss.
Returns: Loss tensor
-
tf_predict
(states, internals, update)¶
-
tf_reference
(states, internals, reward, update)¶ Creates the TensorFlow operations for obtaining the reference tensor(s), in case of a comparative loss.
Parameters: - states – Dict of state tensors.
- internals – List of prior internal state tensors.
- reward – Reward tensor.
- update – Boolean tensor indicating whether this call happens during an update.
Returns: Reference tensor(s).
-
tf_regularization_loss
()¶
-
-
class
tensorforce.core.baselines.
CNNBaseline
(conv_sizes, dense_sizes, scope='cnn-baseline', summary_labels=())¶ Bases:
tensorforce.core.baselines.network_baseline.NetworkBaseline
CNN baseline (single-state) consisting of convolutional layers followed by dense layers.
-
__init__
(conv_sizes, dense_sizes, scope='cnn-baseline', summary_labels=())¶ CNN baseline.
Parameters: - conv_sizes – List of convolutional layer sizes
- dense_sizes – List of dense layer sizes
-
from_spec
(spec, kwargs=None)¶ Creates a baseline from a specification dict.
-
get_summaries
()¶
-
get_variables
(include_nontrainable=False)¶
-
tf_loss
(states, internals, reward, update, reference=None)¶ Creates the TensorFlow operations for calculating the L2 loss between predicted state values and actual rewards.
Parameters: - states – Dict of state tensors.
- internals – List of prior internal state tensors.
- reward – Reward tensor.
- update – Boolean tensor indicating whether this call happens during an update.
- reference – Optional reference tensor(s), in case of a comparative loss.
Returns: Loss tensor
-
tf_predict
(states, internals, update)¶
-
tf_reference
(states, internals, reward, update)¶ Creates the TensorFlow operations for obtaining the reference tensor(s), in case of a comparative loss.
Parameters: - states – Dict of state tensors.
- internals – List of prior internal state tensors.
- reward – Reward tensor.
- update – Boolean tensor indicating whether this call happens during an update.
Returns: Reference tensor(s).
-
tf_regularization_loss
()¶
-