Policies¶
Default policy: depends on agent configuration, but always with default argument network
(with default argument layers
), so a list
is a short-form specification of a sequential layer-stack network architecture:
Agent.create(
...
policy=[
dict(type='dense', size=64, activation='tanh'),
dict(type='dense', size=64, activation='tanh')
],
...
)
Or simply:
Agent.create(
...
policy=dict(network='auto'),
...
)
See the networks documentation for more information about how to specify a network.
Example of a full parametrized-distributions policy specification with customized distribution and decaying temperature:
Agent.create(
...
policy=dict(
type='parametrized_distributions',
network=[
dict(type='dense', size=64, activation='tanh'),
dict(type='dense', size=64, activation='tanh')
],
distributions=dict(
float=dict(type='gaussian', global_stddev=True),
bounded_action=dict(type='beta')
),
temperature=dict(
type='decaying', decay='exponential', unit='episodes',
num_steps=100, initial_value=0.01, decay_rate=0.5
)
)
...
)
-
class
tensorforce.core.policies.
ParametrizedActionValue
(network='auto', *, device=None, l2_regularization=None, name=None, states_spec=None, auxiliaries_spec=None, internals_spec=None, actions_spec=None)¶ Policy which parametrizes an action-value function, conditioned on the output of a neural network processing the input state (specification key:
parametrized_action_value
).Parameters: - network ('auto' | specification) – Policy network configuration, see networks (default: ‘auto’, automatically configured network).
- device (string) – Device name (default: inherit value of parent module).
- l2_regularization (float >= 0.0) – Scalar controlling L2 regularization (default: inherit value of parent module).
- name (string) – internal use.
- states_spec (specification) – internal use.
- auxiliaries_spec (specification) – internal use.
- internals_spec (specification) – internal use.
- actions_spec (specification) – internal use.
-
class
tensorforce.core.policies.
ParametrizedDistributions
(network='auto', *, distributions=None, temperature=1.0, use_beta_distribution=False, device=None, l2_regularization=None, name=None, states_spec=None, auxiliaries_spec=None, internals_spec=None, actions_spec=None)¶ Policy which parametrizes independent distributions per action, conditioned on the output of a central neural network processing the input state, supporting both a stochastic and value-based policy interface (specification key:
parametrized_distributions
).Parameters: - network ('auto' | specification) –
Policy network configuration, see networks (default: ‘auto’, automatically configured network).
- distributions (dict[specification]) – Distributions configuration, see distributions, specified per action-type or -name (default: per action-type, Bernoulli distribution for binary boolean actions, categorical distribution for discrete integer actions, Gaussian distribution for unbounded continuous actions, Beta distribution for bounded continuous actions).
- temperature (parameter | dict[parameter], float >= 0.0) – Sampling temperature, global or per action (default: 1.0).
- use_beta_distribution (bool) – Whether to use the Beta distribution for bounded continuous actions by default. (default: false).
- device (string) – Device name (default: inherit value of parent module).
- l2_regularization (float >= 0.0) – Scalar controlling L2 regularization (default: inherit value of parent module).
- name (string) – internal use.
- states_spec (specification) – internal use.
- auxiliaries_spec (specification) – internal use.
- internals_spec (specification) – internal use.
- actions_spec (specification) – internal use.
- network ('auto' | specification) –
-
class
tensorforce.core.policies.
ParametrizedStateValue
(network='auto', *, device=None, l2_regularization=None, name=None, states_spec=None, auxiliaries_spec=None, internals_spec=None, actions_spec=None)¶ Policy which parametrizes a state-value function, conditioned on the output of a neural network processing the input state (specification key:
parametrized_state_value
).Parameters: - network ('auto' | specification) –
Policy network configuration, see networks (default: ‘auto’, automatically configured network).
- device (string) – Device name (default: inherit value of parent module).
- l2_regularization (float >= 0.0) – Scalar controlling L2 regularization (default: inherit value of parent module).
- name (string) – internal use.
- states_spec (specification) – internal use.
- auxiliaries_spec (specification) – internal use.
- internals_spec (specification) – internal use.
- actions_spec (specification) – internal use.
- network ('auto' | specification) –
-
class
tensorforce.core.policies.
ParametrizedValuePolicy
(network='auto', *, state_value_mode='separate', device=None, l2_regularization=None, name=None, states_spec=None, auxiliaries_spec=None, internals_spec=None, actions_spec=None)¶ Policy which parametrizes independent action-/advantage-/state-value functions per action and optionally a state-value function, conditioned on the output of a central neural network processing the input state (specification key:
parametrized_value_policy
).Parameters: - network ('auto' | specification) –
Policy network configuration, see networks (default: ‘auto’, automatically configured network).
- state_value_mode ('implicit' | 'separate' | 'separate-per-action') – Whether to compute the state value implicitly as maximum action value (like DQN), or as either a single separate state-value function or a function per action (like DuelingDQN) (default: single separate state-value function).
- device (string) – Device name (default: inherit value of parent module).
- l2_regularization (float >= 0.0) – Scalar controlling L2 regularization (default: inherit value of parent module).
- name (string) – internal use.
- states_spec (specification) – internal use.
- auxiliaries_spec (specification) – internal use.
- internals_spec (specification) – internal use.
- actions_spec (specification) – internal use.
- network ('auto' | specification) –