superpc.cv {superpc} | R Documentation |
Cross-validation for supervised principal components
Description
This function uses a form of cross-validation to estimate the optimal feature threshold in supervised principal components
Usage
superpc.cv(fit,
data,
n.threshold=20,
n.fold=NULL,
folds=NULL,
n.components=3,
min.features=5,
max.features=nrow(data$x),
compute.fullcv= TRUE,
compute.preval=TRUE,
xl.mode=c("regular","firsttime","onetime","lasttime"),
xl.time=NULL,
xl.prevfit=NULL)
Arguments
fit |
Object returned by superpc.train |
data |
Data object of form described in superpc.train documentation |
n.threshold |
Number of thresholds to consider. Default 20. |
n.fold |
Number of cross-validation folds. default is around 10 (program pick a convenient value based on the sample size |
folds |
List of indices of cross-validation folds (optional) |
n.components |
Number of cross-validation components to use: 1,2 or 3. |
min.features |
Minimum number of features to include in determining range for threshold. Default 5. |
max.features |
Maximum number of features to include in determining range for threshold. Default is total number of features in the dataset |
compute.fullcv |
Should full cross-validation be done? |
compute.preval |
Should full pre-validation be done? |
xl.mode |
Used by Excel interface only |
xl.time |
Used by Excel interface only |
xl.prevfit |
Used by Excel interface only |
Details
This function uses a form of cross-validation to estimate the optimal feature threshold in supervised principal components. To avoid prolems with fitting Cox models to samll validation datastes, it uses the "pre-validation" approach of Tibshirani and Efron (2002)
Value
threshold |
Vector of thresholds considered |
nonzero |
Number of features exceeding each value of the threshold |
scor.preval |
Likelihood ratio scores from pre-validation |
scor |
Full CV scores |
folds |
Indices of CV folds used |
featurescores.folds |
Feature scores for each fold |
v.preval |
The pre-validated predictors |
type |
problem type |
call |
calling sequence |
Author(s)
"Eric Bair, Ph.D."
"Jean-Eudes Dazard, Ph.D."
"Rob Tibshirani, Ph.D."
Maintainer: "Jean-Eudes Dazard, Ph.D."
References
E. Bair and R. Tibshirani (2004). "Semi-supervised methods to predict patient survival from gene expression data." PLoS Biol, 2(4):e108.
E. Bair, T. Hastie, D. Paul, and R. Tibshirani (2006). "Prediction by supervised principal components." J. Am. Stat. Assoc., 101(473):119-137.
Examples
## Not run:
set.seed(332)
#generate some data
x <- matrix(rnorm(50*30), ncol=30)
y <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
censoring.status <- sample(c(rep(1,20), rep(0,10)))
featurenames <- paste("feature", as.character(1:50), sep="")
data <- list(x=x,
y=y,
censoring.status=censoring.status,
featurenames=featurenames)
a <- superpc.train(data, type="survival")
aa <- superpc.cv(a, data)
## End(Not run)