higrad {higrad}R Documentation

Fitting HiGrad

Description

higrad is used to implement hierarchical incremental gradient descent (HiGrad), an algorithm that conducts statistical inference for online learning.

Usage

higrad(x, y, model = "lm", nsteps = nrow(x), nsplits = 2, nthreads = 2,
  step.ratio = 1, n0 = NA, skip = 0, eta = 1/2, alpha = 1/2,
  burnin = round(nsteps/10), start = rnorm(ncol(x), 0, 0.01),
  replace = FALSE, track = FALSE)

Arguments

x

input matrix of features. Each row is an observation vector, and each column is a feature.

y

response variable. Quantitative for model = "lm". For model = "logistic" it should be a factor with two levels.

model

type of model to fit. Currently only linear regression ("lm") and logistic regression ("logistic") are supported.

nsteps

total number of steps. This is equivalent to the number of queries made to get a noisy evaluation of the gradient.

nsplits

number of splits in the HiGrad tree.

nthreads

numbers of threads each previous thread is split into. Either a number (equal split size throughout) or a vector.

step.ratio

ratio of the lengths of the threads from the two adjacent levels (the latter one divided by the previous). Either a number (equal ratio throughout) or a vector.

n0

length of the 0th-level thread.

skip

number of steps to skip when estimating the coefficients by averaging.

eta

constant in front of the step size. See Details for the formula of the step size.

alpha

exponent of the step size. See Details for the formula of the step size.

burnin

number of steps as the burn-in period. The burn-in period is not accounted for in the total budget nsteps.

start

starting values of the coefficients.

replace

logical; whether or not to sample the data with replacement.

track

logical; whether or not to store the entire path for plotting.

Details

HiGrad is designed to conduct statistical inference for online learning, without incurring additional computational cost compared with the vanilla stochastic gradient descent (SGD). The HiGrad procedure begins by performing SGD iterations for a while and then split the single thread into a few, and this procedure hierarchically operates in this fashion along each thread. With predictions provided by multiple threads in place, a t-based confidence interval is constructed by de-correlating predictions using covariance structures given by the Ruppert–Polyak averaging scheme. In order to implement HiGrad, a configuration of the tree structure needs to be specified. The default setting is a binary tree with 2 splits. The step size is set to be eta*t^(-alpha).

Value

An object with S3 class higrad.

coefficients

estimate of the coefficients.

coefficients.bootstrap

matrix of estimates of the coefficients along each HiGrad threads.

model

model type.

Sigma0

covariance structure \Sigma of the estimates.

track

entire path of the estimates along each thread. Can be used for diagnostic and check for convergence.

References

Weijie Su and Yuancheng Zhu. (2018) Statistical Inference for Online Learning and Stochastic Approximation via Hierarchical Incremental Gradient Descent. https://arxiv.org/abs/1802.04876.

See Also

See print.higrad, plot.higrad, predict.higrad for other methods for the higrad class.

Examples

# fitting linear regression on a simulated dataset
n <- 1e3
d <- 10
sigma <- 0.1
theta <- rep(1, d)
x <- matrix(rnorm(n * d), n, d)
y <- as.numeric(x %*% theta + rnorm(n, 0, sigma))
fit <- higrad(x, y, model = "lm")
print(fit)
# predict for 10 new samples
newx <- matrix(rnorm(10 * d), 10, d)
pred <- predict(fit, newx)
pred


[Package higrad version 0.1.0 Index]