Policies¶
Default policy: ParametrizedDistributions
-
class
tensorforce.core.policies.
ParametrizedDistributions
(name, states_spec, actions_spec, network='auto', distributions=None, temperature=0.0, device=None, summary_labels=None, l2_regularization=None)[source]¶ Policy which parametrizes independent distributions per action conditioned on the output of a central states-processing neural network (supports both stochastic and action-value-based policy interface) (specification key:
parametrized_distributions
).Parameters: - name (string) – Module name (internal use).
- states_spec (specification) – States specification (internal use).
- actions_spec (specification) – Actions specification (internal use).
- network ('auto' | specification) – Policy network configuration, see networks (default: ‘auto’, automatically configured network).
- distributions (dict[specification]) – Distributions configuration, see distributions, specified per action-type or -name (default: per action-type, Bernoulli distribution for binary boolean actions, categorical distribution for discrete integer actions, Gaussian distribution for unbounded continuous actions, Beta distribution for bounded continuous actions).
- temperature (parameter | dict[parameter], float >= 0.0) – Sampling temperature, global or per action (default: 0.0).
- device (string) – Device name (default: inherit value of parent module).
- summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
- l2_regularization (float >= 0.0) – Scalar controlling L2 regularization (default: inherit value of parent module).