R: Cross-validated Linear Discriminant Analysis

LinearDA {PredPsych}

R Documentation

Cross-validated Linear Discriminant Analysis

Description

A simple function to perform cross-validated Linear Discriminant Analysis

Usage

LinearDA(Data, classCol, selectedCols, cvType, nTrainFolds,
  ntrainTestFolds, modelTrainFolds, foldSep, CV = FALSE, cvFraction,
  extendedResults = FALSE, SetSeed = TRUE, silent = FALSE,
  NewData = NULL, ...)

Arguments

`Data`	(dataframe) Data dataframe
`classCol`	(numeric or string) column number that contains the variable to be predicted
`selectedCols`	(optional) (numeric or string) all the columns of data that would be used either as predictor or as feature
`cvType`	(optional) (string) which type of cross-validation scheme to follow; One of the following values: folds = (default) k-fold cross-validation LOSO = Leave-one-subject-out cross-validation holdout = holdout Crossvalidation. Only a portion of data (cvFraction) is used for training. LOTO = Leave-one-trial out cross-validation.
`nTrainFolds`	= (optional) (parameter for only k-fold cross-validation) No. of folds in which to further divide Training dataset
`ntrainTestFolds`	= (optional) (parameter for only k-fold cross-validation) No. of folds for training and testing dataset
`modelTrainFolds`	= (optional) (parameter for only k-fold cross-validation) specific folds from the first train/test split (ntrainTestFolds) to use for training
`foldSep`	(numeric) (parameter for only Leave-One_subject Out) mandatory column number for Leave-one-subject out cross-validation.
`CV`	(optional) (logical) perform Cross validation of training dataset? If TRUE, posterior probabilites are present with the model
`cvFraction`	(optional) (numeric) Fraction of data to keep for training data
`extendedResults`	(optional) (logical) Return extended results with model and other metrics
`SetSeed`	(optional) (logical) Whether to setseed or not. use SetSeed to seed the random number generator to get consistent results; set false only for permutation tests
`silent`	(optional) (logical) whether to print messages or not
`NewData`	(optional) (dataframe) New Data frame features for which the class membership is requested
`...`	(optional) additional arguments for the function

Details

The function implements Linear Disciminant Analysis, a simple algorithm for classification based analyses .LDA builds a model composed of a number of discriminant functions based on linear combinations of data features that provide the best discrimination between two or more conditions/classes. The aim of the statistical analysis in LDA is thus to combine the data features scores in a way that a single new composite variable, the discriminant function, is produced (for details see Fisher, 1936; Rao, 1948)).

Value

Depending upon extendedResults. extendedResults = FALSE outputs Test accuracy accTest of discrimination; extendedResults = TRUE outputs Test accuracy accTest of discrimination, ConfusionMatrixResults Overall cross-validated confusion matrix results,ConfMatrix Confusion matrices and fitLDA the fit cross-validated LDA model. If CV = TRUE , Posterior probabilities are generated and stored in the model.

Author(s)

Atesh Koul, C'MON unit, Istituto Italiano di Tecnologia

atesh.koul@gmail.com

References

Fisher, R. A. (1936). The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics, 7(2), 179-188.

Rao, C. (1948). The Utilization of Multiple Measurements in Problems of Biological Classification. In Journal of the Royal Statistical Society. Series B (Methodological) (Vol. 10, pp. 159-203).

Examples

# simple model with holdout data partition of 80% and no extended results 
LDAModel <- LinearDA(Data = KinData, classCol = 1, 
selectedCols = c(1,2,12,22,32,42,52,62,72,82,92,102,112),cvType="holdout")
# Output:
#
# Performing Linear Discriminant Analysis
#
#
# Performing holdout Cross-validation
# 
# cvFraction was not specified,
#  Using default value of 0.8 (80%) fraction for training (cvFraction = 0.8)
# 
# Proportion of Test/Train Data was :  0.2470588 
# Predicted
# Actual  1  2
# 1 51 32
# 2 40 45
# [1] "Test holdout Accuracy is 0.57"
# holdout LDA Analysis: 
# cvFraction : 0.8 
# Test Accuracy 0.57
# *Legend:
# cvFraction = Fraction of data to keep for training data 
# Test Accuracy = mean accuracy from the Testing dataset

# alt uses:
# holdout cross-validation with 80% training data
LDAModel <- LinearDA(Data = KinData, classCol = 1,
selectedCols = c(1,2,12,22,32,42,52,62,72,82,92,102,112),
CV=FALSE,cvFraction = 0.8,extendedResults = TRUE,cvType="holdout")

# For a 10 fold cross-validation without outputting messages 
LDAModel <-  LinearDA(Data = KinData, classCol = 1,
selectedCols = c(1,2,12,22,32,42,52,62,72,82,92,102,112),
extendedResults = FALSE,cvType = "folds",nTrainFolds=10,silent = TRUE)

[Package PredPsych version 0.4 Index]