Objectives¶
-
class
tensorforce.core.objectives.
ActionValue
(name, huber_loss=0.0, mean_over_actions=False, summary_labels=None)[source]¶ State-action-value / Q-value objective, which minimizes the L2-distance between the state-action-value estimate and target reward value (specification key:
action_value
).Parameters: - name (string) – Module name (internal use).
- huber_loss (parameter, float > 0.0) – Huber loss threshold (default: no huber loss).
- mean_over_actions (bool) – Whether to compute objective for mean of state-action-values instead of per state-action-value (default: false).
- summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
-
class
tensorforce.core.objectives.
Plus
(name, objective1, objective2, summary_labels=None)[source]¶ Additive combination of two objectives (specification key:
plus
).Parameters: - name (string) – Module name (internal use).
- objective1 (specification) – First objective configuration (required).
- objective2 (specification) – Second objective configuration (required).
- summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
-
class
tensorforce.core.objectives.
PolicyGradient
(name, ratio_based=False, clipping_value=0.0, mean_over_actions=False, summary_labels=None)[source]¶ Policy gradient objective, which maximizes the log-likelihood or likelihood-ratio scaled by the target reward value (specification key:
policy_gradient
).Parameters: - name (string) – Module name (internal use).
- ratio_based (bool) – Whether to scale the likelihood-ratio instead of the log-likelihood (default: false).
- clipping_value (parameter, float > 0.0) – Clipping threshold for the maximized value (default: no clipping).
- mean_over_actions (bool) – Whether to compute objective for mean of likelihoods instead of per likelihood (default: false).
- summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).
-
class
tensorforce.core.objectives.
StateValue
(name, huber_loss=0.0, mean_over_actions=False, summary_labels=None)[source]¶ State-value objective, which minimizes the L2-distance between the state-value estimate and target reward value (specification key:
state_value
).Parameters: - name (string) – Module name (internal use).
- huber_loss (parameter, float > 0.0) – Huber loss threshold (default: no huber loss).
- mean_over_actions (bool) – Whether to compute objective for mean of state-values instead of per state-value (default: false).
- summary_labels ('all' | iter[string]) – Labels of summaries to record (default: inherit value of parent module).