tensorforce.core.baselines package

Submodules

tensorforce.core.baselines.aggregated_baseline module

class tensorforce.core.baselines.aggregated_baseline.AggregatedBaseline(baselines, scope='aggregated-baseline', summary_labels=())

Bases: tensorforce.core.baselines.baseline.Baseline

Baseline which aggregates per-state baselines.

__init__(baselines, scope='aggregated-baseline', summary_labels=())

Aggregated baseline.

Parameters:baselines – Dict of per-state baseline specification dicts
from_spec(spec, kwargs=None)

Creates a baseline from a specification dict.

get_summaries()
get_variables(include_nontrainable=False)
tf_loss(states, internals, reward, update, reference=None)

Creates the TensorFlow operations for calculating the L2 loss between predicted state values and actual rewards.

Parameters:
  • states – Dict of state tensors.
  • internals – List of prior internal state tensors.
  • reward – Reward tensor.
  • update – Boolean tensor indicating whether this call happens during an update.
  • reference – Optional reference tensor(s), in case of a comparative loss.
Returns:

Loss tensor

tf_predict(states, internals, update)
tf_reference(states, internals, reward, update)

Creates the TensorFlow operations for obtaining the reference tensor(s), in case of a comparative loss.

Parameters:
  • states – Dict of state tensors.
  • internals – List of prior internal state tensors.
  • reward – Reward tensor.
  • update – Boolean tensor indicating whether this call happens during an update.
Returns:

Reference tensor(s).

tf_regularization_loss()

tensorforce.core.baselines.baseline module

class tensorforce.core.baselines.baseline.Baseline(scope='baseline', summary_labels=None)

Bases: object

Base class for baseline value functions.

__init__(scope='baseline', summary_labels=None)

Baseline.

static from_spec(spec, kwargs=None)

Creates a baseline from a specification dict.

get_summaries()

Returns the TensorFlow summaries reported by the baseline

Returns:List of summaries
get_variables(include_nontrainable=False)

Returns the TensorFlow variables used by the baseline.

Returns:List of variables
tf_loss(states, internals, reward, update, reference=None)

Creates the TensorFlow operations for calculating the L2 loss between predicted state values and actual rewards.

Parameters:
  • states – Dict of state tensors.
  • internals – List of prior internal state tensors.
  • reward – Reward tensor.
  • update – Boolean tensor indicating whether this call happens during an update.
  • reference – Optional reference tensor(s), in case of a comparative loss.
Returns:

Loss tensor

tf_predict(states, internals, update)

Creates the TensorFlow operations for predicting the value function of given states. :param states: Dict of state tensors. :param internals: List of prior internal state tensors. :param update: Boolean tensor indicating whether this call happens during an update.

Returns:State value tensor
tf_reference(states, internals, reward, update)

Creates the TensorFlow operations for obtaining the reference tensor(s), in case of a comparative loss.

Parameters:
  • states – Dict of state tensors.
  • internals – List of prior internal state tensors.
  • reward – Reward tensor.
  • update – Boolean tensor indicating whether this call happens during an update.
Returns:

Reference tensor(s).

tf_regularization_loss()

Creates the TensorFlow operations for the baseline regularization loss/

Returns:Regularization loss tensor

tensorforce.core.baselines.cnn_baseline module

class tensorforce.core.baselines.cnn_baseline.CNNBaseline(conv_sizes, dense_sizes, scope='cnn-baseline', summary_labels=())

Bases: tensorforce.core.baselines.network_baseline.NetworkBaseline

CNN baseline (single-state) consisting of convolutional layers followed by dense layers.

__init__(conv_sizes, dense_sizes, scope='cnn-baseline', summary_labels=())

CNN baseline.

Parameters:
  • conv_sizes – List of convolutional layer sizes
  • dense_sizes – List of dense layer sizes
from_spec(spec, kwargs=None)

Creates a baseline from a specification dict.

get_summaries()
get_variables(include_nontrainable=False)
tf_loss(states, internals, reward, update, reference=None)

Creates the TensorFlow operations for calculating the L2 loss between predicted state values and actual rewards.

Parameters:
  • states – Dict of state tensors.
  • internals – List of prior internal state tensors.
  • reward – Reward tensor.
  • update – Boolean tensor indicating whether this call happens during an update.
  • reference – Optional reference tensor(s), in case of a comparative loss.
Returns:

Loss tensor

tf_predict(states, internals, update)
tf_reference(states, internals, reward, update)

Creates the TensorFlow operations for obtaining the reference tensor(s), in case of a comparative loss.

Parameters:
  • states – Dict of state tensors.
  • internals – List of prior internal state tensors.
  • reward – Reward tensor.
  • update – Boolean tensor indicating whether this call happens during an update.
Returns:

Reference tensor(s).

tf_regularization_loss()

tensorforce.core.baselines.mlp_baseline module

class tensorforce.core.baselines.mlp_baseline.MLPBaseline(sizes, scope='mlp-baseline', summary_labels=())

Bases: tensorforce.core.baselines.network_baseline.NetworkBaseline

Multi-layer perceptron baseline (single-state) consisting of dense layers.

__init__(sizes, scope='mlp-baseline', summary_labels=())

Multi-layer perceptron baseline.

Parameters:sizes – List of dense layer sizes
from_spec(spec, kwargs=None)

Creates a baseline from a specification dict.

get_summaries()
get_variables(include_nontrainable=False)
tf_loss(states, internals, reward, update, reference=None)

Creates the TensorFlow operations for calculating the L2 loss between predicted state values and actual rewards.

Parameters:
  • states – Dict of state tensors.
  • internals – List of prior internal state tensors.
  • reward – Reward tensor.
  • update – Boolean tensor indicating whether this call happens during an update.
  • reference – Optional reference tensor(s), in case of a comparative loss.
Returns:

Loss tensor

tf_predict(states, internals, update)
tf_reference(states, internals, reward, update)

Creates the TensorFlow operations for obtaining the reference tensor(s), in case of a comparative loss.

Parameters:
  • states – Dict of state tensors.
  • internals – List of prior internal state tensors.
  • reward – Reward tensor.
  • update – Boolean tensor indicating whether this call happens during an update.
Returns:

Reference tensor(s).

tf_regularization_loss()

tensorforce.core.baselines.network_baseline module

class tensorforce.core.baselines.network_baseline.NetworkBaseline(network, scope='network-baseline', summary_labels=())

Bases: tensorforce.core.baselines.baseline.Baseline

Baseline based on a TensorForce network, used when parameters are shared between the value function and the baseline.

__init__(network, scope='network-baseline', summary_labels=())

Network baseline.

Parameters:network_spec – Network specification dict
from_spec(spec, kwargs=None)

Creates a baseline from a specification dict.

get_summaries()
get_variables(include_nontrainable=False)
tf_loss(states, internals, reward, update, reference=None)

Creates the TensorFlow operations for calculating the L2 loss between predicted state values and actual rewards.

Parameters:
  • states – Dict of state tensors.
  • internals – List of prior internal state tensors.
  • reward – Reward tensor.
  • update – Boolean tensor indicating whether this call happens during an update.
  • reference – Optional reference tensor(s), in case of a comparative loss.
Returns:

Loss tensor

tf_predict(states, internals, update)
tf_reference(states, internals, reward, update)

Creates the TensorFlow operations for obtaining the reference tensor(s), in case of a comparative loss.

Parameters:
  • states – Dict of state tensors.
  • internals – List of prior internal state tensors.
  • reward – Reward tensor.
  • update – Boolean tensor indicating whether this call happens during an update.
Returns:

Reference tensor(s).

tf_regularization_loss()

Module contents

class tensorforce.core.baselines.Baseline(scope='baseline', summary_labels=None)

Bases: object

Base class for baseline value functions.

__init__(scope='baseline', summary_labels=None)

Baseline.

static from_spec(spec, kwargs=None)

Creates a baseline from a specification dict.

get_summaries()

Returns the TensorFlow summaries reported by the baseline

Returns:List of summaries
get_variables(include_nontrainable=False)

Returns the TensorFlow variables used by the baseline.

Returns:List of variables
tf_loss(states, internals, reward, update, reference=None)

Creates the TensorFlow operations for calculating the L2 loss between predicted state values and actual rewards.

Parameters:
  • states – Dict of state tensors.
  • internals – List of prior internal state tensors.
  • reward – Reward tensor.
  • update – Boolean tensor indicating whether this call happens during an update.
  • reference – Optional reference tensor(s), in case of a comparative loss.
Returns:

Loss tensor

tf_predict(states, internals, update)

Creates the TensorFlow operations for predicting the value function of given states. :param states: Dict of state tensors. :param internals: List of prior internal state tensors. :param update: Boolean tensor indicating whether this call happens during an update.

Returns:State value tensor
tf_reference(states, internals, reward, update)

Creates the TensorFlow operations for obtaining the reference tensor(s), in case of a comparative loss.

Parameters:
  • states – Dict of state tensors.
  • internals – List of prior internal state tensors.
  • reward – Reward tensor.
  • update – Boolean tensor indicating whether this call happens during an update.
Returns:

Reference tensor(s).

tf_regularization_loss()

Creates the TensorFlow operations for the baseline regularization loss/

Returns:Regularization loss tensor
class tensorforce.core.baselines.AggregatedBaseline(baselines, scope='aggregated-baseline', summary_labels=())

Bases: tensorforce.core.baselines.baseline.Baseline

Baseline which aggregates per-state baselines.

__init__(baselines, scope='aggregated-baseline', summary_labels=())

Aggregated baseline.

Parameters:baselines – Dict of per-state baseline specification dicts
from_spec(spec, kwargs=None)

Creates a baseline from a specification dict.

get_summaries()
get_variables(include_nontrainable=False)
tf_loss(states, internals, reward, update, reference=None)

Creates the TensorFlow operations for calculating the L2 loss between predicted state values and actual rewards.

Parameters:
  • states – Dict of state tensors.
  • internals – List of prior internal state tensors.
  • reward – Reward tensor.
  • update – Boolean tensor indicating whether this call happens during an update.
  • reference – Optional reference tensor(s), in case of a comparative loss.
Returns:

Loss tensor

tf_predict(states, internals, update)
tf_reference(states, internals, reward, update)

Creates the TensorFlow operations for obtaining the reference tensor(s), in case of a comparative loss.

Parameters:
  • states – Dict of state tensors.
  • internals – List of prior internal state tensors.
  • reward – Reward tensor.
  • update – Boolean tensor indicating whether this call happens during an update.
Returns:

Reference tensor(s).

tf_regularization_loss()
class tensorforce.core.baselines.NetworkBaseline(network, scope='network-baseline', summary_labels=())

Bases: tensorforce.core.baselines.baseline.Baseline

Baseline based on a TensorForce network, used when parameters are shared between the value function and the baseline.

__init__(network, scope='network-baseline', summary_labels=())

Network baseline.

Parameters:network_spec – Network specification dict
from_spec(spec, kwargs=None)

Creates a baseline from a specification dict.

get_summaries()
get_variables(include_nontrainable=False)
tf_loss(states, internals, reward, update, reference=None)

Creates the TensorFlow operations for calculating the L2 loss between predicted state values and actual rewards.

Parameters:
  • states – Dict of state tensors.
  • internals – List of prior internal state tensors.
  • reward – Reward tensor.
  • update – Boolean tensor indicating whether this call happens during an update.
  • reference – Optional reference tensor(s), in case of a comparative loss.
Returns:

Loss tensor

tf_predict(states, internals, update)
tf_reference(states, internals, reward, update)

Creates the TensorFlow operations for obtaining the reference tensor(s), in case of a comparative loss.

Parameters:
  • states – Dict of state tensors.
  • internals – List of prior internal state tensors.
  • reward – Reward tensor.
  • update – Boolean tensor indicating whether this call happens during an update.
Returns:

Reference tensor(s).

tf_regularization_loss()
class tensorforce.core.baselines.MLPBaseline(sizes, scope='mlp-baseline', summary_labels=())

Bases: tensorforce.core.baselines.network_baseline.NetworkBaseline

Multi-layer perceptron baseline (single-state) consisting of dense layers.

__init__(sizes, scope='mlp-baseline', summary_labels=())

Multi-layer perceptron baseline.

Parameters:sizes – List of dense layer sizes
from_spec(spec, kwargs=None)

Creates a baseline from a specification dict.

get_summaries()
get_variables(include_nontrainable=False)
tf_loss(states, internals, reward, update, reference=None)

Creates the TensorFlow operations for calculating the L2 loss between predicted state values and actual rewards.

Parameters:
  • states – Dict of state tensors.
  • internals – List of prior internal state tensors.
  • reward – Reward tensor.
  • update – Boolean tensor indicating whether this call happens during an update.
  • reference – Optional reference tensor(s), in case of a comparative loss.
Returns:

Loss tensor

tf_predict(states, internals, update)
tf_reference(states, internals, reward, update)

Creates the TensorFlow operations for obtaining the reference tensor(s), in case of a comparative loss.

Parameters:
  • states – Dict of state tensors.
  • internals – List of prior internal state tensors.
  • reward – Reward tensor.
  • update – Boolean tensor indicating whether this call happens during an update.
Returns:

Reference tensor(s).

tf_regularization_loss()
class tensorforce.core.baselines.CNNBaseline(conv_sizes, dense_sizes, scope='cnn-baseline', summary_labels=())

Bases: tensorforce.core.baselines.network_baseline.NetworkBaseline

CNN baseline (single-state) consisting of convolutional layers followed by dense layers.

__init__(conv_sizes, dense_sizes, scope='cnn-baseline', summary_labels=())

CNN baseline.

Parameters:
  • conv_sizes – List of convolutional layer sizes
  • dense_sizes – List of dense layer sizes
from_spec(spec, kwargs=None)

Creates a baseline from a specification dict.

get_summaries()
get_variables(include_nontrainable=False)
tf_loss(states, internals, reward, update, reference=None)

Creates the TensorFlow operations for calculating the L2 loss between predicted state values and actual rewards.

Parameters:
  • states – Dict of state tensors.
  • internals – List of prior internal state tensors.
  • reward – Reward tensor.
  • update – Boolean tensor indicating whether this call happens during an update.
  • reference – Optional reference tensor(s), in case of a comparative loss.
Returns:

Loss tensor

tf_predict(states, internals, update)
tf_reference(states, internals, reward, update)

Creates the TensorFlow operations for obtaining the reference tensor(s), in case of a comparative loss.

Parameters:
  • states – Dict of state tensors.
  • internals – List of prior internal state tensors.
  • reward – Reward tensor.
  • update – Boolean tensor indicating whether this call happens during an update.
Returns:

Reference tensor(s).

tf_regularization_loss()