Objectives

class tensorforce.core.objectives.DeterministicPolicyGradient(name, summary_labels=None)[source]

Deterministic policy gradient objective (specification key: det_policy_gradient).

Parameters:
  • name (string) – Module name (internal use).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.objectives.Plus(name, objective1, objective2, summary_labels=None)[source]

Additive combination of two objectives (specification key: plus).

Parameters:
  • name (string) – Module name (internal use).
  • objective1 (specification) – First objective configuration (required).
  • objective2 (specification) – Second objective configuration (required).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.objectives.PolicyGradient(name, ratio_based=False, clipping_value=0.0, early_reduce=False, summary_labels=None)[source]

Policy gradient objective, which maximizes the log-likelihood or likelihood-ratio scaled by the target reward value (specification key: policy_gradient).

Parameters:
  • name (string) – Module name (internal use).
  • ratio_based (bool) – Whether to scale the likelihood-ratio instead of the log-likelihood (default: false).
  • clipping_value (parameter, float > 0.0) – Clipping threshold for the maximized value (default: no clipping).
  • early_reduce (bool) – Whether to compute objective for reduced likelihoods instead of per likelihood (default: false).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
class tensorforce.core.objectives.Value(name, value='state', huber_loss=0.0, early_reduce=False, summary_labels=None)[source]

Value approximation objective, which minimizes the L2-distance between the state-(action-)value estimate and the target reward value (specification key: value).

Parameters:
  • name (string) – Module name (internal use).
  • value ("state" | "action") – Whether to approximate the state- or state-action-value (default: “state”).
  • huber_loss (parameter, float > 0.0) – Huber loss threshold (default: no huber loss).
  • early_reduce (bool) – Whether to compute objective for reduced values instead of value per action (default: false).
  • summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).