higrad {higrad} | R Documentation |
Fitting HiGrad
Description
higrad
is used to implement hierarchical incremental gradient descent (HiGrad), an algorithm that conducts statistical inference for online learning.
Usage
higrad(x, y, model = "lm", nsteps = nrow(x), nsplits = 2, nthreads = 2,
step.ratio = 1, n0 = NA, skip = 0, eta = 1/2, alpha = 1/2,
burnin = round(nsteps/10), start = rnorm(ncol(x), 0, 0.01),
replace = FALSE, track = FALSE)
Arguments
x |
input matrix of features. Each row is an observation vector, and each column is a feature. |
y |
response variable. Quantitative for |
model |
type of model to fit. Currently only linear regression ( |
nsteps |
total number of steps. This is equivalent to the number of queries made to get a noisy evaluation of the gradient. |
nsplits |
number of splits in the HiGrad tree. |
nthreads |
numbers of threads each previous thread is split into. Either a number (equal split size throughout) or a vector. |
step.ratio |
ratio of the lengths of the threads from the two adjacent levels (the latter one divided by the previous). Either a number (equal ratio throughout) or a vector. |
n0 |
length of the 0th-level thread. |
skip |
number of steps to skip when estimating the coefficients by averaging. |
eta |
constant in front of the step size. See Details for the formula of the step size. |
alpha |
exponent of the step size. See Details for the formula of the step size. |
burnin |
number of steps as the burn-in period. The burn-in period is not accounted for in the total budget |
start |
starting values of the coefficients. |
replace |
logical; whether or not to sample the data with replacement. |
track |
logical; whether or not to store the entire path for plotting. |
Details
HiGrad is designed to conduct statistical inference for online learning, without incurring additional computational cost compared with the vanilla stochastic gradient descent (SGD).
The HiGrad procedure begins by performing SGD iterations for a while and then split the single thread into a few, and this procedure hierarchically operates in this fashion along each thread.
With predictions provided by multiple threads in place, a t-based confidence interval is constructed by de-correlating predictions using covariance structures given by the Ruppert–Polyak averaging scheme.
In order to implement HiGrad, a configuration of the tree structure needs to be specified. The default setting is a binary tree with 2 splits.
The step size is set to be eta*t^(-alpha)
.
Value
An object with S3 class higrad
.
coefficients |
estimate of the coefficients. |
coefficients.bootstrap |
matrix of estimates of the coefficients along each HiGrad threads. |
model |
model type. |
Sigma0 |
covariance structure |
track |
entire path of the estimates along each thread. Can be used for diagnostic and check for convergence. |
References
Weijie Su and Yuancheng Zhu. (2018) Statistical Inference for Online Learning and Stochastic Approximation via Hierarchical Incremental Gradient Descent. https://arxiv.org/abs/1802.04876.
See Also
See print.higrad
, plot.higrad
, predict.higrad
for other methods for the higrad
class.
Examples
# fitting linear regression on a simulated dataset
n <- 1e3
d <- 10
sigma <- 0.1
theta <- rep(1, d)
x <- matrix(rnorm(n * d), n, d)
y <- as.numeric(x %*% theta + rnorm(n, 0, sigma))
fit <- higrad(x, y, model = "lm")
print(fit)
# predict for 10 new samples
newx <- matrix(rnorm(10 * d), 10, d)
pred <- predict(fit, newx)
pred