Objectives

class tensorforce.core.objectives.PolicyGradient(*, importance_sampling=False, clipping_value=None, early_reduce=True, name=None, states_spec=None, internals_spec=None, auxiliaries_spec=None, actions_spec=None, reward_spec=None)

Policy gradient objective, which maximizes the log-likelihood or likelihood-ratio scaled by the target reward value (specification key: policy_gradient).

Parameters:
  • importance_sampling (bool) – Whether to use the importance sampling version of the policy gradient objective (default: false).
  • clipping_value (parameter, float > 0.0) – Clipping threshold for the maximized value (default: no clipping).
  • early_reduce (bool) – Whether to compute objective for aggregated likelihood instead of likelihood per action (default: true).
  • name (string) – internal use.
  • states_spec (specification) – internal use.
  • internals_spec (specification) – internal use.
  • auxiliaries_spec (specification) – internal use.
  • actions_spec (specification) – internal use.
  • reward_spec (specification) – internal use.
class tensorforce.core.objectives.Value(*, value, huber_loss=None, early_reduce=True, name=None, states_spec=None, internals_spec=None, auxiliaries_spec=None, actions_spec=None, reward_spec=None)

Value approximation objective, which minimizes the L2-distance between the state-(action-)value estimate and the target reward value (specification key: value, state_value, action_value).

Parameters:
  • value ("state" | "action") – Whether to approximate the state- or state-action-value (required).
  • huber_loss (parameter, float > 0.0) – Huber loss threshold (default: no huber loss).
  • early_reduce (bool) – Whether to compute objective for aggregated value instead of value per action (default: true).
  • name (string) – internal use.
  • states_spec (specification) – internal use.
  • internals_spec (specification) – internal use.
  • auxiliaries_spec (specification) – internal use.
  • actions_spec (specification) – internal use.
  • reward_spec (specification) – internal use.
class tensorforce.core.objectives.DeterministicPolicyGradient(*, name=None, states_spec=None, internals_spec=None, auxiliaries_spec=None, actions_spec=None, reward_spec=None)

Deterministic policy gradient objective (specification key: det_policy_gradient).

Parameters:
  • name (string) – internal use.
  • states_spec (specification) – internal use.
  • internals_spec (specification) – internal use.
  • auxiliaries_spec (specification) – internal use.
  • actions_spec (specification) – internal use.
  • reward_spec (specification) – internal use.
class tensorforce.core.objectives.Plus(*, objective1, objective2, name=None, states_spec=None, internals_spec=None, auxiliaries_spec=None, actions_spec=None, reward_spec=None)

Additive combination of two objectives (specification key: plus).

Parameters:
  • objective1 (specification) – First objective configuration (required).
  • objective2 (specification) – Second objective configuration (required).
  • name (string) – internal use.
  • states_spec (specification) – internal use.
  • internals_spec (specification) – internal use.
  • auxiliaries_spec (specification) – internal use.
  • actions_spec (specification) – internal use.
  • reward_spec (specification) – internal use.