R: Cross-validation for supervised principal components

superpc.cv {superpc}

R Documentation

Cross-validation for supervised principal components

Description

This function uses a form of cross-validation to estimate the optimal feature threshold in supervised principal components

Usage

    superpc.cv(fit,
               data, 
               n.threshold=20,
               n.fold=NULL,
               folds=NULL,
               n.components=3, 
               min.features=5, 
               max.features=nrow(data$x),
               compute.fullcv= TRUE,
               compute.preval=TRUE, 
               xl.mode=c("regular","firsttime","onetime","lasttime"), 
               xl.time=NULL,
               xl.prevfit=NULL)

Arguments

`fit`	Object returned by superpc.train
`data`	Data object of form described in superpc.train documentation
`n.threshold`	Number of thresholds to consider. Default 20.
`n.fold`	Number of cross-validation folds. default is around 10 (program pick a convenient value based on the sample size
`folds`	List of indices of cross-validation folds (optional)
`n.components`	Number of cross-validation components to use: 1,2 or 3.
`min.features`	Minimum number of features to include in determining range for threshold. Default 5.
`max.features`	Maximum number of features to include in determining range for threshold. Default is total number of features in the dataset
`compute.fullcv`	Should full cross-validation be done?
`compute.preval`	Should full pre-validation be done?
`xl.mode`	Used by Excel interface only
`xl.time`	Used by Excel interface only
`xl.prevfit`	Used by Excel interface only

Details

This function uses a form of cross-validation to estimate the optimal feature threshold in supervised principal components. To avoid prolems with fitting Cox models to samll validation datastes, it uses the "pre-validation" approach of Tibshirani and Efron (2002)

Value

`threshold`	Vector of thresholds considered
`nonzero`	Number of features exceeding each value of the threshold
`scor.preval`	Likelihood ratio scores from pre-validation
`scor`	Full CV scores
`folds`	Indices of CV folds used
`featurescores.folds`	Feature scores for each fold
`v.preval`	The pre-validated predictors
`type`	problem type
`call`	calling sequence

Author(s)

"Eric Bair, Ph.D."
"Jean-Eudes Dazard, Ph.D."
"Rob Tibshirani, Ph.D."

Maintainer: "Jean-Eudes Dazard, Ph.D."

References

E. Bair and R. Tibshirani (2004). "Semi-supervised methods to predict patient survival from gene expression data." PLoS Biol, 2(4):e108.
E. Bair, T. Hastie, D. Paul, and R. Tibshirani (2006). "Prediction by supervised principal components." J. Am. Stat. Assoc., 101(473):119-137.

Examples

## Not run: 
set.seed(332)

#generate some data
x <- matrix(rnorm(50*30), ncol=30)
y <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
censoring.status <- sample(c(rep(1,20), rep(0,10)))

featurenames <- paste("feature", as.character(1:50), sep="")
data <- list(x=x, 
             y=y, 
             censoring.status=censoring.status, 
             featurenames=featurenames)

a <- superpc.train(data, type="survival")
aa <- superpc.cv(a, data)

## End(Not run)

[Package superpc version 1.12 Index]