R: Factor Adjusted Discriminant Analysis 3-4 : Supervised...

FADA {FADA}

R Documentation

Factor Adjusted Discriminant Analysis 3-4 : Supervised classification on decorrelated data

Description

This function performs supervised classification on factor-adjusted data.

Usage

FADA(faobject, K=10,B=20, nbf.cv = NULL,method = c("glmnet", 
    "sda", "sparseLDA"), sda.method = c("lfdr", "HC"), alpha=0.1, ...)

Arguments

`faobject`	An object returned by function `decorrelate.train` or `decorrelate.test`.
`K`	Number of folds to estimate classification error rate, only when no testing data is provided. Default is `K=10`.
`B`	Number of replications of the cross-validation. Default is `B=20`.
`nbf.cv`	Number of factors for cross validation to compute error rate, only when no testing data is provided. By default, `nbf = NULL` and the number of factors is estimated for each fold of the cross validation. `nbf` can also be set to a positive integer value. If `nbf = 0`, the data are not factor-adjusted.
`method`	The method used to perform supervised classification model. 3 options are available. If `method = "glmnet"`, a Lasso penalized logistic regression is performed using glmnet R package. If `method = "sda"`, a LDA or DDA (see `diagonal` argument) is performed using Shrinkage Discriminant Analysis using sda R package. If `method = "sparseLDA"`, a Lasso penalized LDA is performed using SparseLDA R package.
`sda.method`	The method used for variable selection, only if `method="sda"`. If `sda.method="lfdr"`, variables are selected through CAT scores and False Non Discovery Rate control. If sda.method="HC", the variable selection method is Higher Cristicism Thresholding.
`alpha`	The proportion of the HC objective to be observed, only if method="sda" and sda.method="HC". Default is 0.1.
`...`	Some arguments to tune the classification method. See the documentation of the chosen method (glmnet, sda or sda) for more informations about these parameters.

Value

Returns a list with the following elements:

`method`	Recall of the classification method
`selected`	A vector containing index of the selected variables
`proba.train`	A matrix containing predicted group frequencies of training data.
`proba.test`	A matrix containing predicted group frequencies of testing data, if a testing data set has been provided
`predict.test`	A matrix containing predicted classes of testing data, if a testing data set has been provided
`cv.error`	A numeric value containing the average classification error, computed by cross validation, if no testing data set has been provided
`cv.error.se`	A numeric value containing the standard error of the classification error, computed by cross validation, if no testing data set has been provided
`mod`	The classification model performed. The class of this element is the class of a model returned by the chosen method. See the documentation of the chosen method for more details.

Author(s)

Emeline Perthame, Chloe Friguet and David Causeur

References

Ahdesmaki, M. and Strimmer, K. (2010), Feature selection in omics prediction problems using cat scores and false non-discovery rate control. Annals of Applied Statistics, 4, 503-519.

Clemmensen, L., Hastie, T. and Witten, D. and Ersboll, B. (2011), Sparse discriminant analysis. Technometrics, 53(4), 406-413.

Friedman, J., Hastie, T. and Tibshirani, R. (2010), Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33, 1-22.

Friguet, C., Kloareg, M. and Causeur, D. (2009), A factor model approach to multiple testing under dependence. Journal of the American Statistical Association, 104:488, 1406-1415.

Perthame, E., Friguet, C. and Causeur, D. (2015), Stability of feature selection in classification issues for high-dimensional correlated data, Statistics and Computing.

Examples

data(data.train)
data(data.test)

# When testing data set is provided
res = decorrelate.train(data.train)
res2 = decorrelate.test(res, data.test)
classif = FADA(res2,method="sda",sda.method="lfdr")

### Not run 
# When no testing data set is provided
# Classification error rate is computed by a K-fold cross validation.
# res = decorrelate.train(data.train)
# classif = FADA(res, method="sda",sda.method="lfdr")

[Package FADA version 1.3.5 Index]