Objectives¶
-
class
tensorforce.core.objectives.
PolicyGradient
(*, importance_sampling=False, clipping_value=None, early_reduce=True, name=None, states_spec=None, internals_spec=None, auxiliaries_spec=None, actions_spec=None, reward_spec=None)¶ Policy gradient objective, which maximizes the log-likelihood or likelihood-ratio scaled by the target reward value (specification key:
policy_gradient
).Parameters: - importance_sampling (bool) – Whether to use the importance sampling version of the policy gradient objective (default: false).
- clipping_value (parameter, float > 0.0) – Clipping threshold for the maximized value (default: no clipping).
- early_reduce (bool) – Whether to compute objective for aggregated likelihood instead of likelihood per action (default: true).
- name (string) – internal use.
- states_spec (specification) – internal use.
- internals_spec (specification) – internal use.
- auxiliaries_spec (specification) – internal use.
- actions_spec (specification) – internal use.
- reward_spec (specification) – internal use.
-
class
tensorforce.core.objectives.
Value
(*, value, huber_loss=None, early_reduce=True, name=None, states_spec=None, internals_spec=None, auxiliaries_spec=None, actions_spec=None, reward_spec=None)¶ Value approximation objective, which minimizes the L2-distance between the state-(action-)value estimate and the target reward value (specification key:
value
,state_value
,action_value
).Parameters: - value ("state" | "action") – Whether to approximate the state- or state-action-value (required).
- huber_loss (parameter, float > 0.0) – Huber loss threshold (default: no huber loss).
- early_reduce (bool) – Whether to compute objective for aggregated value instead of value per action (default: true).
- name (string) – internal use.
- states_spec (specification) – internal use.
- internals_spec (specification) – internal use.
- auxiliaries_spec (specification) – internal use.
- actions_spec (specification) – internal use.
- reward_spec (specification) – internal use.
-
class
tensorforce.core.objectives.
DeterministicPolicyGradient
(*, name=None, states_spec=None, internals_spec=None, auxiliaries_spec=None, actions_spec=None, reward_spec=None)¶ Deterministic policy gradient objective (specification key:
det_policy_gradient
).Parameters: - name (string) – internal use.
- states_spec (specification) – internal use.
- internals_spec (specification) – internal use.
- auxiliaries_spec (specification) – internal use.
- actions_spec (specification) – internal use.
- reward_spec (specification) – internal use.
-
class
tensorforce.core.objectives.
Plus
(*, objective1, objective2, name=None, states_spec=None, internals_spec=None, auxiliaries_spec=None, actions_spec=None, reward_spec=None)¶ Additive combination of two objectives (specification key:
plus
).Parameters: - objective1 (specification) – First objective configuration (required).
- objective2 (specification) – Second objective configuration (required).
- name (string) – internal use.
- states_spec (specification) – internal use.
- internals_spec (specification) – internal use.
- auxiliaries_spec (specification) – internal use.
- actions_spec (specification) – internal use.
- reward_spec (specification) – internal use.