tensorforce.core.baselines package¶

Submodules¶

tensorforce.core.baselines.aggregated_baseline module¶

class tensorforce.core.baselines.aggregated_baseline.AggregatedBaseline(baselines, scope='aggregated-baseline', summary_labels=())¶

Bases: tensorforce.core.baselines.baseline.Baseline

Baseline which aggregates per-state baselines.

__init__(baselines, scope='aggregated-baseline', summary_labels=())¶

Aggregated baseline.

Parameters:	baselines – Dict of per-state baseline specification dicts

from_spec(spec, kwargs=None)¶: Creates a baseline from a specification dict.

get_summaries()¶

get_variables(include_nontrainable=False)¶

tf_loss(states, internals, reward, update, reference=None)¶

Creates the TensorFlow operations for calculating the L2 loss between predicted state values and actual rewards.

Parameters:	states – Dict of state tensors. internals – List of prior internal state tensors. reward – Reward tensor. update – Boolean tensor indicating whether this call happens during an update. reference – Optional reference tensor(s), in case of a comparative loss.
Returns:	Loss tensor

tf_predict(states, internals, update)¶

tf_reference(states, internals, reward, update)¶

Creates the TensorFlow operations for obtaining the reference tensor(s), in case of a comparative loss.

Parameters:	states – Dict of state tensors. internals – List of prior internal state tensors. reward – Reward tensor. update – Boolean tensor indicating whether this call happens during an update.
Returns:	Reference tensor(s).

tf_regularization_loss()¶

tensorforce.core.baselines.baseline module¶

class tensorforce.core.baselines.baseline.Baseline(scope='baseline', summary_labels=None)¶

Bases: object

Base class for baseline value functions.

__init__(scope='baseline', summary_labels=None)¶: Baseline.

static from_spec(spec, kwargs=None)¶: Creates a baseline from a specification dict.

get_summaries()¶

Returns the TensorFlow summaries reported by the baseline

Returns:	List of summaries

get_variables(include_nontrainable=False)¶

Returns the TensorFlow variables used by the baseline.

Returns:	List of variables

tf_loss(states, internals, reward, update, reference=None)¶

Creates the TensorFlow operations for calculating the L2 loss between predicted state values and actual rewards.

Parameters:	states – Dict of state tensors. internals – List of prior internal state tensors. reward – Reward tensor. update – Boolean tensor indicating whether this call happens during an update. reference – Optional reference tensor(s), in case of a comparative loss.
Returns:	Loss tensor

tf_predict(states, internals, update)¶

Creates the TensorFlow operations for predicting the value function of given states. :param states: Dict of state tensors. :param internals: List of prior internal state tensors. :param update: Boolean tensor indicating whether this call happens during an update.

Returns:	State value tensor

tf_reference(states, internals, reward, update)¶

Creates the TensorFlow operations for obtaining the reference tensor(s), in case of a comparative loss.

Parameters:	states – Dict of state tensors. internals – List of prior internal state tensors. reward – Reward tensor. update – Boolean tensor indicating whether this call happens during an update.
Returns:	Reference tensor(s).

tf_regularization_loss()¶

Creates the TensorFlow operations for the baseline regularization loss/

Returns:	Regularization loss tensor

tensorforce.core.baselines.cnn_baseline module¶

class tensorforce.core.baselines.cnn_baseline.CNNBaseline(conv_sizes, dense_sizes, scope='cnn-baseline', summary_labels=())¶

Bases: tensorforce.core.baselines.network_baseline.NetworkBaseline

CNN baseline (single-state) consisting of convolutional layers followed by dense layers.

__init__(conv_sizes, dense_sizes, scope='cnn-baseline', summary_labels=())¶

CNN baseline.

Parameters:	conv_sizes – List of convolutional layer sizes dense_sizes – List of dense layer sizes

from_spec(spec, kwargs=None)¶: Creates a baseline from a specification dict.

get_summaries()¶

get_variables(include_nontrainable=False)¶

tf_loss(states, internals, reward, update, reference=None)¶

Creates the TensorFlow operations for calculating the L2 loss between predicted state values and actual rewards.

Parameters:	states – Dict of state tensors. internals – List of prior internal state tensors. reward – Reward tensor. update – Boolean tensor indicating whether this call happens during an update. reference – Optional reference tensor(s), in case of a comparative loss.
Returns:	Loss tensor

tf_predict(states, internals, update)¶

tf_reference(states, internals, reward, update)¶

Creates the TensorFlow operations for obtaining the reference tensor(s), in case of a comparative loss.

Parameters:	states – Dict of state tensors. internals – List of prior internal state tensors. reward – Reward tensor. update – Boolean tensor indicating whether this call happens during an update.
Returns:	Reference tensor(s).

tf_regularization_loss()¶

tensorforce.core.baselines.mlp_baseline module¶

class tensorforce.core.baselines.mlp_baseline.MLPBaseline(sizes, scope='mlp-baseline', summary_labels=())¶

Bases: tensorforce.core.baselines.network_baseline.NetworkBaseline

Multi-layer perceptron baseline (single-state) consisting of dense layers.

__init__(sizes, scope='mlp-baseline', summary_labels=())¶

Multi-layer perceptron baseline.

Parameters:	sizes – List of dense layer sizes

from_spec(spec, kwargs=None)¶: Creates a baseline from a specification dict.

get_summaries()¶

get_variables(include_nontrainable=False)¶

tf_loss(states, internals, reward, update, reference=None)¶

Creates the TensorFlow operations for calculating the L2 loss between predicted state values and actual rewards.

Parameters:	states – Dict of state tensors. internals – List of prior internal state tensors. reward – Reward tensor. update – Boolean tensor indicating whether this call happens during an update. reference – Optional reference tensor(s), in case of a comparative loss.
Returns:	Loss tensor

tf_predict(states, internals, update)¶

tf_reference(states, internals, reward, update)¶

Creates the TensorFlow operations for obtaining the reference tensor(s), in case of a comparative loss.

Parameters:	states – Dict of state tensors. internals – List of prior internal state tensors. reward – Reward tensor. update – Boolean tensor indicating whether this call happens during an update.
Returns:	Reference tensor(s).

tf_regularization_loss()¶

tensorforce.core.baselines.network_baseline module¶

class tensorforce.core.baselines.network_baseline.NetworkBaseline(network, scope='network-baseline', summary_labels=())¶

Bases: tensorforce.core.baselines.baseline.Baseline

Baseline based on a TensorForce network, used when parameters are shared between the value function and the baseline.

__init__(network, scope='network-baseline', summary_labels=())¶

Network baseline.

Parameters:	network_spec – Network specification dict

from_spec(spec, kwargs=None)¶: Creates a baseline from a specification dict.

get_summaries()¶

get_variables(include_nontrainable=False)¶

tf_loss(states, internals, reward, update, reference=None)¶

Creates the TensorFlow operations for calculating the L2 loss between predicted state values and actual rewards.

Parameters:	states – Dict of state tensors. internals – List of prior internal state tensors. reward – Reward tensor. update – Boolean tensor indicating whether this call happens during an update. reference – Optional reference tensor(s), in case of a comparative loss.
Returns:	Loss tensor

tf_predict(states, internals, update)¶

tf_reference(states, internals, reward, update)¶

Creates the TensorFlow operations for obtaining the reference tensor(s), in case of a comparative loss.

Parameters:	states – Dict of state tensors. internals – List of prior internal state tensors. reward – Reward tensor. update – Boolean tensor indicating whether this call happens during an update.
Returns:	Reference tensor(s).

tf_regularization_loss()¶

Module contents¶

class tensorforce.core.baselines.Baseline(scope='baseline', summary_labels=None)¶

Bases: object

Base class for baseline value functions.

__init__(scope='baseline', summary_labels=None)¶: Baseline.

static from_spec(spec, kwargs=None)¶: Creates a baseline from a specification dict.

get_summaries()¶

Returns the TensorFlow summaries reported by the baseline

Returns:	List of summaries

get_variables(include_nontrainable=False)¶

Returns the TensorFlow variables used by the baseline.

Returns:	List of variables

tf_loss(states, internals, reward, update, reference=None)¶

Creates the TensorFlow operations for calculating the L2 loss between predicted state values and actual rewards.

Parameters:	states – Dict of state tensors. internals – List of prior internal state tensors. reward – Reward tensor. update – Boolean tensor indicating whether this call happens during an update. reference – Optional reference tensor(s), in case of a comparative loss.
Returns:	Loss tensor

tf_predict(states, internals, update)¶

Creates the TensorFlow operations for predicting the value function of given states. :param states: Dict of state tensors. :param internals: List of prior internal state tensors. :param update: Boolean tensor indicating whether this call happens during an update.

Returns:	State value tensor

tf_reference(states, internals, reward, update)¶

Creates the TensorFlow operations for obtaining the reference tensor(s), in case of a comparative loss.

Parameters:	states – Dict of state tensors. internals – List of prior internal state tensors. reward – Reward tensor. update – Boolean tensor indicating whether this call happens during an update.
Returns:	Reference tensor(s).

tf_regularization_loss()¶

Creates the TensorFlow operations for the baseline regularization loss/

Returns:	Regularization loss tensor

class tensorforce.core.baselines.AggregatedBaseline(baselines, scope='aggregated-baseline', summary_labels=())¶

Bases: tensorforce.core.baselines.baseline.Baseline

Baseline which aggregates per-state baselines.

__init__(baselines, scope='aggregated-baseline', summary_labels=())¶

Aggregated baseline.

Parameters:	baselines – Dict of per-state baseline specification dicts

from_spec(spec, kwargs=None)¶: Creates a baseline from a specification dict.

get_summaries()¶

get_variables(include_nontrainable=False)¶

tf_loss(states, internals, reward, update, reference=None)¶

Creates the TensorFlow operations for calculating the L2 loss between predicted state values and actual rewards.

Parameters:	states – Dict of state tensors. internals – List of prior internal state tensors. reward – Reward tensor. update – Boolean tensor indicating whether this call happens during an update. reference – Optional reference tensor(s), in case of a comparative loss.
Returns:	Loss tensor

tf_predict(states, internals, update)¶

tf_reference(states, internals, reward, update)¶

Creates the TensorFlow operations for obtaining the reference tensor(s), in case of a comparative loss.

Parameters:	states – Dict of state tensors. internals – List of prior internal state tensors. reward – Reward tensor. update – Boolean tensor indicating whether this call happens during an update.
Returns:	Reference tensor(s).

tf_regularization_loss()¶

class tensorforce.core.baselines.NetworkBaseline(network, scope='network-baseline', summary_labels=())¶

Bases: tensorforce.core.baselines.baseline.Baseline

Baseline based on a TensorForce network, used when parameters are shared between the value function and the baseline.

__init__(network, scope='network-baseline', summary_labels=())¶

Network baseline.

Parameters:	network_spec – Network specification dict

from_spec(spec, kwargs=None)¶: Creates a baseline from a specification dict.

get_summaries()¶

get_variables(include_nontrainable=False)¶

tf_loss(states, internals, reward, update, reference=None)¶

Creates the TensorFlow operations for calculating the L2 loss between predicted state values and actual rewards.

Parameters:	states – Dict of state tensors. internals – List of prior internal state tensors. reward – Reward tensor. update – Boolean tensor indicating whether this call happens during an update. reference – Optional reference tensor(s), in case of a comparative loss.
Returns:	Loss tensor

tf_predict(states, internals, update)¶

tf_reference(states, internals, reward, update)¶

Creates the TensorFlow operations for obtaining the reference tensor(s), in case of a comparative loss.

Parameters:	states – Dict of state tensors. internals – List of prior internal state tensors. reward – Reward tensor. update – Boolean tensor indicating whether this call happens during an update.
Returns:	Reference tensor(s).

tf_regularization_loss()¶

class tensorforce.core.baselines.MLPBaseline(sizes, scope='mlp-baseline', summary_labels=())¶

Bases: tensorforce.core.baselines.network_baseline.NetworkBaseline

Multi-layer perceptron baseline (single-state) consisting of dense layers.

__init__(sizes, scope='mlp-baseline', summary_labels=())¶

Multi-layer perceptron baseline.

Parameters:	sizes – List of dense layer sizes

from_spec(spec, kwargs=None)¶: Creates a baseline from a specification dict.

get_summaries()¶

get_variables(include_nontrainable=False)¶

tf_loss(states, internals, reward, update, reference=None)¶

Creates the TensorFlow operations for calculating the L2 loss between predicted state values and actual rewards.

Parameters:	states – Dict of state tensors. internals – List of prior internal state tensors. reward – Reward tensor. update – Boolean tensor indicating whether this call happens during an update. reference – Optional reference tensor(s), in case of a comparative loss.
Returns:	Loss tensor

tf_predict(states, internals, update)¶

tf_reference(states, internals, reward, update)¶

Creates the TensorFlow operations for obtaining the reference tensor(s), in case of a comparative loss.

Parameters:	states – Dict of state tensors. internals – List of prior internal state tensors. reward – Reward tensor. update – Boolean tensor indicating whether this call happens during an update.
Returns:	Reference tensor(s).

tf_regularization_loss()¶

class tensorforce.core.baselines.CNNBaseline(conv_sizes, dense_sizes, scope='cnn-baseline', summary_labels=())¶

Bases: tensorforce.core.baselines.network_baseline.NetworkBaseline

CNN baseline (single-state) consisting of convolutional layers followed by dense layers.

__init__(conv_sizes, dense_sizes, scope='cnn-baseline', summary_labels=())¶

CNN baseline.

Parameters:	conv_sizes – List of convolutional layer sizes dense_sizes – List of dense layer sizes

from_spec(spec, kwargs=None)¶: Creates a baseline from a specification dict.

get_summaries()¶

get_variables(include_nontrainable=False)¶

tf_loss(states, internals, reward, update, reference=None)¶

Creates the TensorFlow operations for calculating the L2 loss between predicted state values and actual rewards.

Parameters:	states – Dict of state tensors. internals – List of prior internal state tensors. reward – Reward tensor. update – Boolean tensor indicating whether this call happens during an update. reference – Optional reference tensor(s), in case of a comparative loss.
Returns:	Loss tensor

tf_predict(states, internals, update)¶

tf_reference(states, internals, reward, update)¶

Creates the TensorFlow operations for obtaining the reference tensor(s), in case of a comparative loss.

Parameters:	states – Dict of state tensors. internals – List of prior internal state tensors. reward – Reward tensor. update – Boolean tensor indicating whether this call happens during an update.
Returns:	Reference tensor(s).

tf_regularization_loss()¶