R: K-fold cross-validation using 'l2boost'.

cv.l2boost {l2boost}

R Documentation

K-fold cross-validation using `l2boost`.

Description

Calculate the K-fold cross-validation prediction error for l2boost models. The prediction error is calculated using mean squared error (MSE). The optimal boosting step (m=opt.step) is obtained by selecting the step m resulting in the minimal MSE.

Usage

cv.l2boost(
  x,
  y,
  K = 10,
  M = NULL,
  nu = 1e-04,
  lambda = NULL,
  trace = FALSE,
  type = c("discrete", "hybrid", "friedman", "lars"),
  cores = NULL,
  ...
)

Arguments

`x`	the design matrix
`y`	the response vector
`K`	number of cross-validation folds (default: 10)
`M`	the total number of iterations passed to `l2boost`.
`nu`	l1 shrinkage parameter (default: 1e-4)
`lambda`	l2 shrinkage parameter for elasticBoost (default: NULL = no l2-regularization)
`trace`	Show computation/debugging output? (default: FALSE)
`type`	Type of l2boost fit with (default: discrete) see `l2boost` for description.
`cores`	number of cores to parallel the cv analysis. If not specified, detects the number of cores. If more than 1 core, use n-1 for cross-validation. Implemented using multicore (mclapply), or clusterApply on Windows machines.
`...`	Additional arguments passed to `l2boost`

Details

The cross-validation method splits the test data set into K mutually exclusive subsets. An l2boost model is built on K different training data sets, each created from a subsample of the full data set by sequentially leaving out one of the K subsets. The prediction error estimate is calculated by averaging the mean square error of each K test sets of the all of the K training datasets. The optimal step m is obtained at the step with a minimal averaged mean square error.

The full l2boost model is run after the cross-validation models, on the full data set. This model is run for the full number of iteration steps M and returned in the cv.l2boost$fit object.

cv.l2boost only optimizes along the iteration count m for a given value of nu. This is equivalent to an L1-regularization optimization. In order to optimize an elasticBoost model on the L2-regularization parameter lambda, a manual two way cross-validation can be obtained by sequentially optimizing over a range of lambda values, and selecting the lambda/opt.step pair resulting in the minimal cross-validated mean square error. See the examples below.

cv.l2boost uses the parallel package internally to speed up the cross-validation process on multicore machines. Parallel is packaged with base R >= 2.14, for earlier releases the multicore package provides the same functionality. By default, cv.l2boost will use all cores available except 1. Each fold is run on it's own core and results are combined automatically. The number of cores can be overridden using the cores function argument.

Value

A list of cross-validation results:

`call`	the matched call.
`type`	Choice of l2boost algorithm from "discrete", "hybrid", "friedman","lars". see `l2boost`
`names`	design matrix column names used in the model
`nu`	The L1 boosting shrinkage parameter value
`lambda`	The L2 elasticBoost shrinkage parameter value
`K`	number of folds used for cross-validation
`mse`	Optimal cross-validation mean square error estimate
`mse.list`	list of K vectors of mean square errors at each step m
`coef`	beta coefficient estimates from the full model at opt.step
`coef.stand`	standardized beta coefficient estimates from full model at opt.step
`opt.step`	optimal step m calculated by minimizing cross-validation error among all K training sets
`opt.norm`	L1 norm of beta coefficients at opt.step
`fit`	`l2boost` fit of full model
`yhat`	estimate of response from full model at opt.step

Examples

## Not run: 
#--------------------------------------------------------------------------
# Example: ElasticBoost simulation
# Compare l2boost and elasticNetBoosting using 10-fold CV
# 
# Elastic net simulation, see Zou H. and Hastie T. Regularization and 
# variable selection via the elastic net. J. Royal Statist. Soc. B, 
# 67(2):301-320, 2005
set.seed(1025)
dta <- elasticNetSim(n=100)

# The default values set up the signal on 3 groups of 5 variables,
# Color the signal variables red, others are grey.
sig <- c(rep("red", 15), rep("grey", 40-15))

# Set the boosting parameters
Mtarget = 1000
nuTarget = 1.e-2

# For CRAN, only use 2 cores in the CV method
cvCores=2

# 10 fold l2boost CV  
cv.obj <- cv.l2boost(dta$x,dta$y,M=Mtarget, nu=nuTarget, cores=cvCores)

# Plot the results
par(mfrow=c(2,3))
plot(cv.obj)
abline(v=cv.obj$opt.step, lty=2, col="grey")
plot(cv.obj$fit, type="coef", ylab=expression(beta[i]))
abline(v=cv.obj$opt.step, lty=2, col="grey")
plot(coef(cv.obj$fit, m=cv.obj$opt.step), cex=.5, 
  ylab=expression(beta[i]), xlab="Column Index", ylim=c(0,140), col=sig)

# elasticBoost l1-regularization parameter lambda=0.1 
# 5 fold elasticNet CV
cv.eBoost <- cv.l2boost(dta$x,dta$y,M=Mtarget, K=5, nu=nuTarget, lambda=.1, cores=cvCores) 

# plot the results
plot(cv.eBoost)
abline(v=cv.eBoost$opt.step, lty=2, col="grey")
plot(cv.eBoost$fit, type="coef", ylab=expression(beta[i]))
abline(v=cv.eBoost$opt.step, lty=2, col="grey")
plot(coef(cv.eBoost$fit, m=cv.obj$opt.step), cex=.5, 
  ylab=expression(beta[i]), xlab="Column Index", ylim=c(0,140), col=sig)

## End(Not run)

[Package l2boost version 1.0.3 Index]

K-fold cross-validation using l2boost.