FADA {FADA} | R Documentation |
Factor Adjusted Discriminant Analysis 3-4 : Supervised classification on decorrelated data
Description
This function performs supervised classification on factor-adjusted data.
Usage
FADA(faobject, K=10,B=20, nbf.cv = NULL,method = c("glmnet",
"sda", "sparseLDA"), sda.method = c("lfdr", "HC"), alpha=0.1, ...)
Arguments
faobject |
An object returned by function |
K |
Number of folds to estimate classification error rate, only when no testing data is provided. Default is |
B |
Number of replications of the cross-validation. Default is |
nbf.cv |
Number of factors for cross validation to compute error rate, only when no testing data is provided. By default, |
method |
The method used to perform supervised classification model. 3 options are available. If
|
sda.method |
The method used for variable selection, only if |
alpha |
The proportion of the HC objective to be observed, only if method="sda" and sda.method="HC". Default is 0.1. |
... |
Some arguments to tune the classification method. See the documentation of the chosen method (glmnet, sda or sda) for more informations about these parameters. |
Value
Returns a list with the following elements:
method |
Recall of the classification method |
selected |
A vector containing index of the selected variables |
proba.train |
A matrix containing predicted group frequencies of training data. |
proba.test |
A matrix containing predicted group frequencies of testing data, if a testing data set has been provided |
predict.test |
A matrix containing predicted classes of testing data, if a testing data set has been provided |
cv.error |
A numeric value containing the average classification error, computed by cross validation, if no testing data set has been provided |
cv.error.se |
A numeric value containing the standard error of the classification error, computed by cross validation, if no testing data set has been provided |
mod |
The classification model performed. The class of this element is the class of a model returned by the chosen method. See the documentation of the chosen method for more details. |
Author(s)
Emeline Perthame, Chloe Friguet and David Causeur
References
Ahdesmaki, M. and Strimmer, K. (2010), Feature selection in omics prediction problems using cat scores and false non-discovery rate control. Annals of Applied Statistics, 4, 503-519.
Clemmensen, L., Hastie, T. and Witten, D. and Ersboll, B. (2011), Sparse discriminant analysis. Technometrics, 53(4), 406-413.
Friedman, J., Hastie, T. and Tibshirani, R. (2010), Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33, 1-22.
Friguet, C., Kloareg, M. and Causeur, D. (2009), A factor model approach to multiple testing under dependence. Journal of the American Statistical Association, 104:488, 1406-1415.
Perthame, E., Friguet, C. and Causeur, D. (2015), Stability of feature selection in classification issues for high-dimensional correlated data, Statistics and Computing.
See Also
FADA
, decorrelate.train
, decorrelate.test
, sda
, sda-package
,
glmnet-package
Examples
data(data.train)
data(data.test)
# When testing data set is provided
res = decorrelate.train(data.train)
res2 = decorrelate.test(res, data.test)
classif = FADA(res2,method="sda",sda.method="lfdr")
### Not run
# When no testing data set is provided
# Classification error rate is computed by a K-fold cross validation.
# res = decorrelate.train(data.train)
# classif = FADA(res, method="sda",sda.method="lfdr")