Default policy: ParametrizedDistributions

class tensorforce.core.policies.ParametrizedDistributions(name, states_spec, actions_spec, network='auto', distributions=None, device=None, summary_labels=None, l2_regularization=None)[source]

Policy which parametrizes independent distributions per action conditioned on the output of a central states-processing neural network (supports both stochastic and action-value-based policy interface) (specification key: parametrized_distributions).

  • name (string) – Module name (internal use).
  • states_spec (specification) – States specification (internal use).
  • actions_spec (specification) – Actions specification (internal use).
  • network ('auto' | specification) – Policy network configuration, see networks (default: ‘auto’, automatically configured network).
  • distributions (dict[specification]) – Distributions configuration, see distributions, specified per action-type or -name (default: per action-type, Bernoulli distribution for binary boolean actions, categorical distribution for discrete integer actions, Gaussian distribution for unbounded continuous actions, Beta distribution for bounded continuous actions).
  • device (string) – Device name (default: inherit value of parent module).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
  • l2_regularization (float >= 0.0) – Scalar controlling L2 regularization (default: inherit value of parent module).