cv.nfeaturesLDA {animation} | R Documentation |
Cross-validation to find the optimum number of features (variables) in LDA
Description
This function provids an illustration of the process of finding out the optimum number of variables using k-fold cross-validation in a linear discriminant analysis (LDA).
Usage
cv.nfeaturesLDA(
data = matrix(rnorm(600), 60),
cl = gl(3, 20),
k = 5,
cex.rg = c(0.5, 3),
col.av = c("blue", "red"),
...
)
Arguments
data |
a data matrix containg the predictors in columns |
cl |
a factor indicating the classification of the rows of |
k |
the number of folds |
cex.rg |
the range of the magnification to be used to the points in the plot |
col.av |
the two colors used to respectively denote rates of correct predictions in the i-th fold and the average rates for all k folds |
... |
arguments passed to |
Details
For a classification problem, usually we wish to use as less variables as possible because of difficulties brought by the high dimension.
The selection procedure is like this:
Split the whole data randomly into
k
folds:For the number of features
g = 1, 2, \cdots, g_{max}
, chooseg
features that have the largest discriminatory power (measured by the F-statistic in ANOVA):For the fold
i
(i = 1, 2, \cdots, k
):-
Train a LDA model without the
i
-th fold data, and predict with thei
-th fold for a proportion of correct predictionsp_{gi}
;
-
Average the
k
proportions to get the correct ratep_g
;
Determine the optimum number of features with the largest
p
.
Note that g_{max}
is set by ani.options('nmax')
(i.e. the
maximum number of features we want to choose).
Value
A list containing
accuracy |
a matrix in which the element in the i-th row and j-th column is the rate of correct predictions based on LDA, i.e. build a LDA model with j variables and predict with data in the i-th fold (the test set) |
optimum |
the optimum number of features based on the cross-validation |
Author(s)
Yihui Xie <https://yihui.org/>
References
Examples at https://yihui.org/animation/example/cv-nfeatureslda/
Maindonald J, Braun J (2007). Data Analysis and Graphics Using R - An Example-Based Approach. Cambridge University Press, 2nd edition. pp. 400