sparsePC.spikeslab {spikeslab} | R Documentation |
Multiclass Prediction using Spike and Slab Regression
Description
Variable selection for the multiclass gene prediction problem.
Usage
sparsePC.spikeslab(x = NULL, y = NULL, n.rep = 10,
n.iter1 = 150, n.iter2 = 100, n.prcmp = 5, max.genes = 100,
ntree = 1000, nodesize = 1, verbose = TRUE, ...)
Arguments
x |
x matrix of gene expressions. |
y |
Class labels. |
n.rep |
Number of Monte Carlo replicates. |
n.iter1 |
Number of burn-in Gibbs sampled values (i.e., discarded values). |
n.iter2 |
Number of Gibbs sampled values, following burn-in. |
n.prcmp |
Number of principal components. |
max.genes |
Maximum number of genes in final model. |
ntree |
Number of trees used by random forests. |
nodesize |
Nodesize of trees. |
verbose |
If TRUE, verbose output is sent to the terminal. |
... |
Further arguments passed to or from other methods. |
Details
Multiclass prediction using a hybrid combination of spike and slab
linear regression and random forest multiclass prediction (Ishwaran
and Rao, 2009). A pseudo y-vector of response values is calculated
using each of the top n.prcmp
principal components of the
x-gene expression matrix. The generalized elastic net obtained from
using spike and slab regression is used to select genes; one
regression fit is used for each of the pseduo y-response vectors. The
final combined set of genes are passed to random forests and used to
construct a multiclass forest predictor. This procedure is repeated
n.rep
times with each Monte Carlo replicate based on balanced
cross-validation with 2/3rds of the data used for training and 1/3rd
used for testing.
—> Miscellanea:
Test set error is only computed when n.rep
is larger than 1.
If n.rep
=1 the full data is used without any cross-validation.
Value
Invisibly, the final set of selected genes as well as the complete set
of genes selected over the n.rep
Monte Carlo replications. The
random forest classifier is also returned.
The misclassification error rate, error rate for each class, and other summary information are output to the terminal.
Author(s)
Hemant Ishwaran (hemant.ishwaran@gmail.com)
J. Sunil Rao (rao.jsunil@gmail.com)
Udaya B. Kogalur (ubk@kogalur.com)
References
Ishwaran H. and Rao J.S. (2009). Generalized ridge regression: geometry and computational solutions when p is larger than n.
See Also
spikeslab
.
Examples
## Not run:
#------------------------------------------------------------
# Example 1: leukemia data
#------------------------------------------------------------
data(leukemia, package = "spikeslab")
sparsePC.out <- sparsePC(x = leukemia[, -1], y = leukemia[, 1], n.rep = 3)
rf.obj <- sparsePC.out$rf.obj
varImpPlot(rf.obj)
## End(Not run)