LinearDA {PredPsych} | R Documentation |
Cross-validated Linear Discriminant Analysis
Description
A simple function to perform cross-validated Linear Discriminant Analysis
Usage
LinearDA(Data, classCol, selectedCols, cvType, nTrainFolds,
ntrainTestFolds, modelTrainFolds, foldSep, CV = FALSE, cvFraction,
extendedResults = FALSE, SetSeed = TRUE, silent = FALSE,
NewData = NULL, ...)
Arguments
Data |
(dataframe) Data dataframe |
classCol |
(numeric or string) column number that contains the variable to be predicted |
selectedCols |
(optional) (numeric or string) all the columns of data that would be used either as predictor or as feature |
cvType |
(optional) (string) which type of cross-validation scheme to follow; One of the following values:
|
nTrainFolds |
= (optional) (parameter for only k-fold cross-validation) No. of folds in which to further divide Training dataset |
ntrainTestFolds |
= (optional) (parameter for only k-fold cross-validation) No. of folds for training and testing dataset |
modelTrainFolds |
= (optional) (parameter for only k-fold cross-validation) specific folds from the first train/test split (ntrainTestFolds) to use for training |
foldSep |
(numeric) (parameter for only Leave-One_subject Out) mandatory column number for Leave-one-subject out cross-validation. |
CV |
(optional) (logical) perform Cross validation of training dataset? If TRUE, posterior probabilites are present with the model |
cvFraction |
(optional) (numeric) Fraction of data to keep for training data |
extendedResults |
(optional) (logical) Return extended results with model and other metrics |
SetSeed |
(optional) (logical) Whether to setseed or not. use SetSeed to seed the random number generator to get consistent results; set false only for permutation tests |
silent |
(optional) (logical) whether to print messages or not |
NewData |
(optional) (dataframe) New Data frame features for which the class membership is requested |
... |
(optional) additional arguments for the function |
Details
The function implements Linear Disciminant Analysis, a simple algorithm for classification based analyses .LDA builds a model composed of a number of discriminant functions based on linear combinations of data features that provide the best discrimination between two or more conditions/classes. The aim of the statistical analysis in LDA is thus to combine the data features scores in a way that a single new composite variable, the discriminant function, is produced (for details see Fisher, 1936; Rao, 1948)).
Value
Depending upon extendedResults
. extendedResults
= FALSE outputs Test accuracy accTest
of discrimination; extendedResults
= TRUE
outputs Test accuracy accTest
of discrimination, ConfusionMatrixResults
Overall cross-validated confusion matrix results,ConfMatrix
Confusion matrices
and fitLDA
the fit cross-validated LDA model. If CV
= TRUE ,
Posterior probabilities are generated and stored in the model.
Author(s)
Atesh Koul, C'MON unit, Istituto Italiano di Tecnologia
References
Fisher, R. A. (1936). The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics, 7(2), 179-188.
Rao, C. (1948). The Utilization of Multiple Measurements in Problems of Biological Classification. In Journal of the Royal Statistical Society. Series B (Methodological) (Vol. 10, pp. 159-203).
Examples
# simple model with holdout data partition of 80% and no extended results
LDAModel <- LinearDA(Data = KinData, classCol = 1,
selectedCols = c(1,2,12,22,32,42,52,62,72,82,92,102,112),cvType="holdout")
# Output:
#
# Performing Linear Discriminant Analysis
#
#
# Performing holdout Cross-validation
#
# cvFraction was not specified,
# Using default value of 0.8 (80%) fraction for training (cvFraction = 0.8)
#
# Proportion of Test/Train Data was : 0.2470588
# Predicted
# Actual 1 2
# 1 51 32
# 2 40 45
# [1] "Test holdout Accuracy is 0.57"
# holdout LDA Analysis:
# cvFraction : 0.8
# Test Accuracy 0.57
# *Legend:
# cvFraction = Fraction of data to keep for training data
# Test Accuracy = mean accuracy from the Testing dataset
# alt uses:
# holdout cross-validation with 80% training data
LDAModel <- LinearDA(Data = KinData, classCol = 1,
selectedCols = c(1,2,12,22,32,42,52,62,72,82,92,102,112),
CV=FALSE,cvFraction = 0.8,extendedResults = TRUE,cvType="holdout")
# For a 10 fold cross-validation without outputting messages
LDAModel <- LinearDA(Data = KinData, classCol = 1,
selectedCols = c(1,2,12,22,32,42,52,62,72,82,92,102,112),
extendedResults = FALSE,cvType = "folds",nTrainFolds=10,silent = TRUE)