cv.spikeslab {spikeslab} | R Documentation |
K-fold Cross-Validation for Spike and Slab Regression
Description
Computes the K-fold cross-validated mean squared prediction error for the generalized elastic net from spike and slab regression. Returns a stability index for each variable.
Usage
cv.spikeslab(x = NULL, y = NULL, K = 10,
plot.it = TRUE, n.iter1 = 500, n.iter2 = 500, mse = TRUE,
bigp.smalln = FALSE, bigp.smalln.factor = 1, screen = (bigp.smalln),
r.effects = NULL, max.var = 500, center = TRUE, intercept = TRUE,
fast = TRUE, beta.blocks = 5, verbose = TRUE, save.all = TRUE,
ntree = 300, seed = NULL, ...)
Arguments
x |
x-predictor matrix. |
y |
y-response values. |
K |
Number of folds. |
plot.it |
If TRUE, plots the mean prediction error and its standard error. |
n.iter1 |
Number of burn-in Gibbs sampled values (i.e., discarded values). |
n.iter2 |
Number of Gibbs sampled values, following burn-in. |
mse |
If TRUE, an external estimate for the overall variance is calculated. |
bigp.smalln |
Use if |
bigp.smalln.factor |
Top |
screen |
If TRUE, variables are first pre-filtered. |
r.effects |
List used for grouping variables (see details below). |
max.var |
Maximum number of variables allowed in the final model. |
center |
If TRUE, variables are centered by their means. Default is TRUE and should only be adjusted in extreme examples. |
intercept |
If TRUE, an intercept is included in the model, otherwise no intercept is included. Default is TRUE. |
fast |
If TRUE, use blocked Gibbs sampling to accelerate the algorithm. |
beta.blocks |
Update beta using this number of blocks ( |
verbose |
If TRUE, verbose output is sent to the terminal. |
save.all |
If TRUE, spikeslab object for each fold is saved and returned. |
ntree |
Number of trees used by random forests (applies only when |
seed |
Seed for random number generator. Must be a negative integer. |
... |
Further arguments passed to or from other methods. |
Value
Invisibly returns a list with components:
spikeslab.obj |
Spike and slab object from the full data. |
cv.spikeslab.obj |
List containing spike and slab objects from each fold. Can be NULL. |
cv.fold |
List containing the cv splits. |
cv |
Mean-squared error for each fold for the gnet. |
cv.path |
A matrix of mean-squared errors for the gnet solution path. Rows correspond to model sizes, columns are the folds. |
stability |
Matrix containing stability for each variable defined as the percentage of times a variable is identified over the K-folds. Also includes bma and gnet coefficient values and their cv-fold-averaged values. |
bma |
bma coefficients from the full data in terms of the standardized x. |
bma.scale |
bma coefficients from the full data, scaled in terms of the original x. |
gnet |
cv-optimized gnet in terms of the standardized x. |
gnet.scale |
cv-optimized gnet in terms of the original x. |
gnet.model |
List of models selected by gnet over the K-folds. |
gnet.path |
gnet path from the full data, scaled in terms of the original x. |
gnet.obj |
gnet object from fitting the full data (a lars-type object). |
gnet.obj.vars |
Variables (in order) used to calculate the gnet object. |
verbose |
Verbose details (used for printing). |
Author(s)
Hemant Ishwaran (hemant.ishwaran@gmail.com)
J. Sunil Rao (rao.jsunil@gmail.com)
Udaya B. Kogalur (ubk@kogalur.com)
References
Ishwaran H. and Rao J.S. (2005a). Spike and slab variable selection: frequentist and Bayesian strategies. Ann. Statist., 33:730-773.
Ishwaran H. and Rao J.S. (2010). Generalized ridge regression: geometry and computational solutions when p is larger than n.
Ishwaran H. and Rao J.S. (2011). Mixing generalized ridge regressions.
See Also
sparsePC.spikeslab
,
plot.spikeslab
,
predict.spikeslab
,
print.spikeslab
.
Examples
## Not run:
#------------------------------------------------------------
# Example 1: 10-fold validation using parallel processing
#------------------------------------------------------------
data(ozoneI, package = "spikeslab")
y <- ozoneI[, 1]
x <- ozoneI[, -1]
cv.obj <- cv.spikeslab(x = x, y = y, parallel = 4)
plot(cv.obj, plot.type = "cv")
plot(cv.obj, plot.type = "path")
#------------------------------------------------------------
# Example 2: 10-fold validation using parallel processing
# (high dimensional diabetes data)
#------------------------------------------------------------
# add 2000 noise variables
data(diabetesI, package = "spikeslab")
diabetes.noise <- cbind(diabetesI,
noise = matrix(rnorm(nrow(diabetesI) * 2000), nrow(diabetesI)))
x <- diabetes.noise[, -1]
y <- diabetes.noise[, 1]
cv.obj <- cv.spikeslab(x = x, y = y, bigp.smalln=TRUE, parallel = 4)
plot(cv.obj)
## End(Not run)