Policies¶

Default policy: ParametrizedDistributions

class tensorforce.core.policies.ParametrizedDistributions(name, states_spec, actions_spec, network='auto', distributions=None, temperature=0.0, device=None, summary_labels=None, l2_regularization=None)[source]¶

Policy which parametrizes independent distributions per action conditioned on the output of a central states-processing neural network (supports both stochastic and action-value-based policy interface) (specification key: parametrized_distributions).

Parameters:

name (string) – Module name (internal use).
states_spec (specification) – States specification (internal use).
actions_spec (specification) – Actions specification (internal use).
network ('auto' | specification) – Policy network configuration, see networks (default: ‘auto’, automatically configured network).
distributions (dict[specification]) – Distributions configuration, see distributions, specified per action-type or -name (default: per action-type, Bernoulli distribution for binary boolean actions, categorical distribution for discrete integer actions, Gaussian distribution for unbounded continuous actions, Beta distribution for bounded continuous actions).
temperature (parameter | dict[parameter], float >= 0.0) – Sampling temperature, global or per action (default: 0.0).
device (string) – Device name (default: inherit value of parent module).
summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
l2_regularization (float >= 0.0) – Scalar controlling L2 regularization (default: inherit value of parent module).