cv.l2boost {l2boost} | R Documentation |
K-fold cross-validation using l2boost
.
Description
Calculate the K-fold cross-validation prediction error for l2boost
models.
The prediction error is calculated using mean squared error (MSE). The optimal boosting step (m=opt.step)
is obtained by selecting the step m resulting in the minimal MSE.
Usage
cv.l2boost(
x,
y,
K = 10,
M = NULL,
nu = 1e-04,
lambda = NULL,
trace = FALSE,
type = c("discrete", "hybrid", "friedman", "lars"),
cores = NULL,
...
)
Arguments
x |
the design matrix |
y |
the response vector |
K |
number of cross-validation folds (default: 10) |
M |
the total number of iterations passed to |
nu |
l1 shrinkage parameter (default: 1e-4) |
lambda |
l2 shrinkage parameter for elasticBoost (default: NULL = no l2-regularization) |
trace |
Show computation/debugging output? (default: FALSE) |
type |
Type of l2boost fit with (default: discrete) see |
cores |
number of cores to parallel the cv analysis. If not specified, detects the number of cores. If more than 1 core, use n-1 for cross-validation. Implemented using multicore (mclapply), or clusterApply on Windows machines. |
... |
Additional arguments passed to |
Details
The cross-validation method splits the test data set into K mutually exclusive subsets. An l2boost
model
is built on K different training data sets, each created from a subsample of the full data set by sequentially leaving
out one of the K subsets. The prediction error estimate is calculated by averaging the mean square error of each K test
sets of the all of the K training datasets. The optimal step m is obtained at the step with a minimal averaged
mean square error.
The full l2boost
model is run after the cross-validation models, on the full data set. This model is
run for the full number of iteration steps M and returned in the cv.l2boost$fit object.
cv.l2boost
only optimizes along the iteration count m for a given value of nu. This is
equivalent to an L1-regularization optimization. In order to optimize an elasticBoost model on the L2-regularization
parameter lambda, a manual two way cross-validation can be obtained by sequentially optimizing over a range of lambda
values, and selecting the lambda/opt.step pair resulting in the minimal cross-validated mean square error. See the
examples below.
cv.l2boost
uses the parallel package internally to speed up the cross-validation process on multicore
machines. Parallel is packaged with base R >= 2.14, for earlier releases the multicore package provides the same
functionality. By default, cv.l2boost
will use all cores available except 1. Each fold is run on it's
own core and results are combined automatically. The number of cores can be overridden using the cores function
argument.
Value
A list of cross-validation results:
call |
the matched call. |
type |
Choice of l2boost algorithm from "discrete", "hybrid", "friedman","lars". see |
names |
design matrix column names used in the model |
nu |
The L1 boosting shrinkage parameter value |
lambda |
The L2 elasticBoost shrinkage parameter value |
K |
number of folds used for cross-validation |
mse |
Optimal cross-validation mean square error estimate |
mse.list |
list of K vectors of mean square errors at each step m |
coef |
beta coefficient estimates from the full model at opt.step |
coef.stand |
standardized beta coefficient estimates from full model at opt.step |
opt.step |
optimal step m calculated by minimizing cross-validation error among all K training sets |
opt.norm |
L1 norm of beta coefficients at opt.step |
fit |
|
yhat |
estimate of response from full model at opt.step |
See Also
l2boost
, plot.l2boost
,
predict.l2boost
and coef.l2boost
Examples
## Not run:
#--------------------------------------------------------------------------
# Example: ElasticBoost simulation
# Compare l2boost and elasticNetBoosting using 10-fold CV
#
# Elastic net simulation, see Zou H. and Hastie T. Regularization and
# variable selection via the elastic net. J. Royal Statist. Soc. B,
# 67(2):301-320, 2005
set.seed(1025)
dta <- elasticNetSim(n=100)
# The default values set up the signal on 3 groups of 5 variables,
# Color the signal variables red, others are grey.
sig <- c(rep("red", 15), rep("grey", 40-15))
# Set the boosting parameters
Mtarget = 1000
nuTarget = 1.e-2
# For CRAN, only use 2 cores in the CV method
cvCores=2
# 10 fold l2boost CV
cv.obj <- cv.l2boost(dta$x,dta$y,M=Mtarget, nu=nuTarget, cores=cvCores)
# Plot the results
par(mfrow=c(2,3))
plot(cv.obj)
abline(v=cv.obj$opt.step, lty=2, col="grey")
plot(cv.obj$fit, type="coef", ylab=expression(beta[i]))
abline(v=cv.obj$opt.step, lty=2, col="grey")
plot(coef(cv.obj$fit, m=cv.obj$opt.step), cex=.5,
ylab=expression(beta[i]), xlab="Column Index", ylim=c(0,140), col=sig)
# elasticBoost l1-regularization parameter lambda=0.1
# 5 fold elasticNet CV
cv.eBoost <- cv.l2boost(dta$x,dta$y,M=Mtarget, K=5, nu=nuTarget, lambda=.1, cores=cvCores)
# plot the results
plot(cv.eBoost)
abline(v=cv.eBoost$opt.step, lty=2, col="grey")
plot(cv.eBoost$fit, type="coef", ylab=expression(beta[i]))
abline(v=cv.eBoost$opt.step, lty=2, col="grey")
plot(coef(cv.eBoost$fit, m=cv.obj$opt.step), cex=.5,
ylab=expression(beta[i]), xlab="Column Index", ylim=c(0,140), col=sig)
## End(Not run)