R: Feature selection for supervised principal components

superpc.predict.red {superpc}

R Documentation

Feature selection for supervised principal components

Description

Forms reduced models to approximate the supervised principal component predictor.

Usage

    superpc.predict.red(fit, 
                        data, 
                        data.test, 
                        threshold, 
                        n.components=3, 
                        n.shrinkage=20, 
                        shrinkages=NULL,
                        compute.lrtest=TRUE,
                        sign.wt="both",
                        prediction.type=c("continuous", "discrete"), 
                        n.class=2)

Arguments

`fit`	Object returned by superpc.train
`data`	Training data object, of form described in superpc.train dcoumentation
`data.test`	Test data object; same form as train
`threshold`	Feature score threshold; usually estimated from superpc.cv
`n.components`	Number of principal components to examine; should equal 1,2, etc up to the number of components used in training
`n.shrinkage`	Number of shrinkage values to consider. Default 20.
`shrinkages`	Shrinkage values to consider. Default NULL.
`compute.lrtest`	Should the likelihood ratio test be computed? Default TRUE
`sign.wt`	Signs of feature weights allowed: "both", "pos", or "neg"
`prediction.type`	Type of prediction: "continuous" (Default) or "discrete". In the latter, superprc score is divided into n.class groups
`n.class`	Number of groups for discrete predictor. Default 2.

Details

Soft-thresholding by each of the "shrinkages" values is applied to the PC loadings. This reduce the number of features used in the model. The reduced predictor is then used in place of the supervised PC predictor.

Value

`shrinkages`	Shrinkage values used
`lrtest.reduced`	Likelihood ratio tests for reduced models
`num.features`	Number of features used in each reduced model
`feature.list`	List of features used in each reduced model
`coef`	Least squares coefficients for each reduced model
`import`	Importance scores for features
`wt`	Weight for each feature, in constructing the reduced predictor
`v.test`	Outcome predictor from reduced models. Array of n.shrinkage by (number of test observations)
`v.test.1df`	Outcome combined predictor from reduced models. Array of n.shrinkage by (number of test observations)
`n.components`	Number of principal components used
`type`	Type of outcome
`call`	calling sequence

Author(s)

"Eric Bair, Ph.D."
"Jean-Eudes Dazard, Ph.D."
"Rob Tibshirani, Ph.D."

Maintainer: "Jean-Eudes Dazard, Ph.D."

References

E. Bair and R. Tibshirani (2004). "Semi-supervised methods to predict patient survival from gene expression data." PLoS Biol, 2(4):e108.
E. Bair, T. Hastie, D. Paul, and R. Tibshirani (2006). "Prediction by supervised principal components." J. Am. Stat. Assoc., 101(473):119-137.

Examples

set.seed(332)

#generate some data
x <- matrix(rnorm(50*30), ncol=30)
y <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
ytest <- 10 + svd(x[1:50,])$v[,1] + .1*rnorm(30)
censoring.status <- sample(c(rep(1,20), rep(0,10)))
censoring.status.test <- sample(c(rep(1,20), rep(0,10)))

featurenames <- paste("feature", as.character(1:50), sep="")
data <- list(x=x,
             y=y, 
             censoring.status=censoring.status, 
             featurenames=featurenames)
data.test <- list(x=x, 
                  y=ytest, 
                  censoring.status=censoring.status.test, 
                  featurenames=featurenames)

a <- superpc.train(data, type="survival")
fit.red <- superpc.predict.red(a,
                               data, 
                               data.test, 
                               threshold=.6)
superpc.plotred.lrtest(fit.red)

[Package superpc version 1.12 Index]