Policies

Default policy: ParametrizedDistributions

class tensorforce.core.policies.ParametrizedDistributions(name, network='auto', distributions=None, temperature=0.0, use_beta_distribution=True, infer_state_value=False, device=None, summary_labels=None, l2_regularization=None, states_spec=None, actions_spec=None)[source]

Policy which parametrizes independent distributions per action conditioned on the output of a central states-processing neural network (supports both stochastic and action-value-based policy interface) (specification key: parametrized_distributions).

Parameters:
  • name (string) – Module name (internal use).
  • network ('auto' | specification) – Policy network configuration, see networks (default: ‘auto’, automatically configured network).
  • distributions (dict[specification]) – Distributions configuration, see distributions, specified per action-type or -name (default: per action-type, Bernoulli distribution for binary boolean actions, categorical distribution for discrete integer actions, Gaussian distribution for unbounded continuous actions, Beta distribution for bounded continuous actions).
  • temperature (parameter | dict[parameter], float >= 0.0) – Sampling temperature, global or per action (default: 0.0).
  • use_beta_distribution (bool) – Whether to use the Beta distribution for bounded continuous actions by default. (default: true).
  • infer_state_value (False | "action-values" | "distribution") – Whether to infer the state value from either the action values or (experimental) the distribution parameters (default: false).
  • device (string) – Device name (default: inherit value of parent module).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
  • l2_regularization (float >= 0.0) – Scalar controlling L2 regularization (default: inherit value of parent module).
  • states_spec (specification) – States specification (internal use).
  • actions_spec (specification) – Actions specification (internal use).