R: Cross Validation for Discriminant Analysis Classification...

DACrossVal {HiDimDA}

R Documentation

Cross Validation for Discriminant Analysis Classification Algorithms

Description

‘DACrossVal’ evaluates the performance of a Discriminant Analysis training algorithm by kfold Cross-Validation.

Usage

DACrossVal(data, grouping, TrainAlg, EvalAlg=EvalClrule, 
Strfolds=TRUE, kfold=10, CVrep=20, prior="proportions", ...)

Arguments

`data`	Matrix or data frame of observations.
`grouping`	Factor specifying the class for each observation.
`TrainAlg`	A function with the training algorithm. It should return an object that can be used as input to the argument of ‘EValAlg’.
`EvalAlg`	A function with the evaluation algorithm. By default set to ‘EvalClrule’ which returns a list with components “err” (estimates of error rates by class) and “Ng” (number of out-sample observations by class). This default can be used for all ‘TrainAlg’ arguments that return an object with a predict method returning a list with a ‘class’ component (a factor) containing the classification results.
`Strfolds`	Boolean flag indicating if the folds should be stratified according to the original class proportions (default), or randomly generated from the whole training sample, ignoring class membership.
`kfold`	Number of training sample folds to be created in each replication.
`CVrep`	Number of replications to be performed.
`prior`	The prior probabilities of class membership. If unspecified, the class proportions for the training set are used. If present, the probabilities should be specified in the order of the factor levels.
`...`	Further arguments to be passed to ‘TrainAlg’ and ‘EvalAlg’.

Value

A three dimensional array with the number of holdout observations, and estimated classification errors for each combination of fold and replication tried. The array dimensions are defined as follows:

The first dimension runs through the different fold-replication combinations.

The second dimension represents the classes.

The third dimension has two named levels representing respectively the number of holdout observations (“Ng”), and the estimated classification errors (“Clerr”).

Author(s)

A. Pedro Duarte Silva

Examples


# Evaluate the performance of traditional (Fisher's) linear discriminant
# analysis on the iris data set, by ten-fold cross-validation replicated 
# three times.

library(MASS)
CrosValRes1 <- DACrossVal(iris[1:4],iris$Species,TrainAlg=lda,CVrep=3)
summary(CrosValRes1[,,"Clerr"])

# Evaluate the performance on Alon's Colon Cancer Data set 
# (with a logarithmic transformation), of a one-factor 
# linear discriminant rule with the best fifty genes, 
# by four-fold cross-validation.

## Not run: 

CrosValRes2 <- DACrossVal(log10(AlonDS[,-1]),AlonDS$grouping,TrainAlg=RFlda,
ldafun="classification",Selmethod="fixedp",maxp=50,kfold=4,CVrep=1)
summary(CrosValRes2[,,"Clerr"])


## End(Not run)

[Package HiDimDA version 0.2-6 Index]