FTRL {opera}R Documentation

Implementation of FTRL (Follow The Regularized Leader)

Description

FTRL (Shalev-Shwartz and Singer 2007) and Chap. 5 of (Hazan 2019) is the online counterpart of empirical risk minimization. It is a family of aggregation rules (including OGD) that uses at any time the empirical risk minimizer so far with an additional regularization. The online optimization can be performed on any bounded convex set that can be expressed with equality or inequality constraints. Note that this method is still under development and a beta version.

Usage

FTRL(
  y,
  experts,
  eta = NULL,
  fun_reg = NULL,
  fun_reg_grad = NULL,
  constr_eq = NULL,
  constr_eq_jac = NULL,
  constr_ineq = NULL,
  constr_ineq_jac = NULL,
  loss.type = list(name = "square"),
  loss.gradient = TRUE,
  w0 = NULL,
  max_iter = 50,
  obj_tol = 0.01,
  training = NULL,
  default = FALSE,
  quiet = TRUE
)

Arguments

y

vector. Real observations.

experts

matrix. Matrix of experts previsions.

eta

numeric. Regularization parameter.

fun_reg

function (NULL). Regularization function to be applied during the optimization.

fun_reg_grad

function (NULL). Gradient of the regularization function (to speed up the computations).

constr_eq

function (NULL). Constraints (equalities) to be applied during the optimization.

constr_eq_jac

function (NULL). Jacobian of the equality constraints (to speed up the computations).

constr_ineq

function (NULL). Constraints (inequalities) to be applied during the optimization (... > 0).

constr_ineq_jac

function (NULL). Jacobian of the inequality constraints (to speed up the computations).

loss.type

character, list or function ("square").

character

Name of the loss to be applied ('square', 'absolute', 'percentage', or 'pinball');

list

List with field name equal to the loss name. If using pinball loss, field tau equal to the required quantile in [0,1];

function

A custom loss as a function of two parameters (prediction, label).

loss.gradient

boolean, function (TRUE).

boolean

If TRUE, the aggregation rule will not be directly applied to the loss function at hand, but to a gradient version of it. The aggregation rule is then similar to gradient descent aggregation rule.

function

If loss.type is a function, the derivative of the loss in its first component should be provided to be used (it is not automatically computed).

w0

numeric (NULL). Vector of initialization for the weights.

max_iter

integer (50). Maximum number of iterations of the optimization algorithm per round.

obj_tol

numeric (1e-2). Tolerance over objective function between two iterations of the optimization.

training

list (NULL). List of previous parameters.

default

boolean (FALSE). Whether or not to use default parameters for fun_reg, constr_eq, constr_ineq and their grad/jac, which values are ALL ignored when TRUE.

quiet

boolean (FALSE). Whether or not to display progress bars.

Value

object of class mixture.

References

Hazan E (2019). “Introduction to online convex optimization.” arXiv preprint arXiv:1909.05207.

Shalev-Shwartz S, Singer Y (2007). “A primal-dual perspective of online learning algorithms.” Machine Learning, 69(2), 115–142.


[Package opera version 1.2.0 Index]