R: Compute and plot cross-validated error for SPLSDA...

cv.splsda {spls}

R Documentation

Compute and plot cross-validated error for SPLSDA classification

Description

Draw heatmap of v-fold cross-validated misclassification rates and return optimal eta (thresholding parameter) and K (number of hidden components).

Usage

cv.splsda( x, y, fold=10, K, eta, kappa=0.5,
        classifier=c('lda','logistic'), scale.x=TRUE, plot.it=TRUE, n.core=8 )

Arguments

`x`	Matrix of predictors.
`y`	Vector of class indices.
`fold`	Number of cross-validation folds. Default is 10-folds.
`K`	Number of hidden components.
`eta`	Thresholding parameter. `eta` should be between 0 and 1.
`kappa`	Parameter to control the effect of the concavity of the objective function and the closeness of original and surrogate direction vectors. `kappa` is relevant only for multicategory classification. `kappa` should be between 0 and 0.5. Default is 0.5.
`classifier`	Classifier used in the second step of SPLSDA. Alternatives are `"logistic"` or `"lda"`. Default is `"lda"`.
`scale.x`	Scale predictors by dividing each predictor variable by its sample standard deviation?
`plot.it`	Draw the heatmap of the cross-validated misclassification rates?
`n.core`	Number of CPUs to be used when parallel computing is utilized.

Details

Parallel computing can be utilized for faster computation. Users can change the number of CPUs to be used by changing the argument n.core.

Value

Invisibly returns a list with components:

`err.mat`	Matrix of cross-validated misclassification rates. Rows correspond to `eta` and columns correspond to number of components (`K`).
`eta.opt`	Optimal `eta`.
`K.opt`	Optimal `K`.

Author(s)

Dongjun Chung and Sunduz Keles.

References

Chung D and Keles S (2010), "Sparse partial least squares classification for high dimensional data", Statistical Applications in Genetics and Molecular Biology, Vol. 9, Article 17.

Examples

data(prostate)
set.seed(1)
# misclassification rate plot. eta is searched between 0.1 and 0.9 and
# number of hidden components is searched between 1 and 5
## Not run:  cv <- cv.splsda( prostate$x, prostate$y, K = c(1:5), eta = seq(0.1,0.9,0.1),
         scale.x=FALSE, fold=5 )
## End(Not run)

(splsda( prostate$x, prostate$y, eta=cv$eta.opt, K=cv$K.opt, scale.x=FALSE ))

[Package spls version 2.2-3 Index]