maccest {mt}R Documentation

Estimation of Multiple Classification Accuracy

Description

Estimation of classification accuracy by multiple classifiers with resampling procedure and comparisons of multiple classifiers.

Usage

maccest(dat, ...)
## Default S3 method:
maccest(dat, cl, method="svm", pars = valipars(), 
        tr.idx = NULL, comp="anova",...) 
## S3 method for class 'formula'
maccest(formula, data = NULL, ..., subset, na.action = na.omit)

Arguments

formula

A formula of the form groups ~ x1 + x2 + ... That is, the response is the grouping factor and the right hand side specifies the (non-factor) discriminators.

data

Data frame from which variables specified in formula are preferentially to be taken.

dat

A matrix or data frame containing the explanatory variables if no formula is given as the principal argument.

cl

A factor specifying the class for each observation if no formula principal argument is given.

method

A vector of multiple classification methods to be used. Classifiers, such as randomForest, svm, knn and lda, can be used. For details, see note below.

pars

A list of resampling scheme such as Leave-one-out cross-validation, Cross-validation, Randomised validation (holdout) and Bootstrap, and control parameters for the calculation of accuracy. See valipars for details.

tr.idx

User defined index of training samples. Can be generated by trainind.

comp

Comparison method of multiple classifier. If comp is anova, the multiple comparisons are performed by ANOVA and then the pairwise comparisons are performed by HSDTukey. If comp is fried, the multiple comparisons are performed by Friedman Test and the pairwise comparisons are performed by Wilcoxon Test.

...

Additional parameters to method.

subset

Optional vector, specifying a subset of observations to be used.

na.action

Function which indicates what should happen when the data contains NA's, defaults to na.omit.

Details

The accuracy rates for classification are obtained used techniques such as Random Forest, Support Vector Machine, k-Nearest Neighbour Classification, Linear Discriminant Analysis and Linear Discriminant Analysis based on sampling methods, including Leave-one-out cross-validation, Cross-validation, Randomised validation (holdout) and Bootstrap.

Value

An object of class maccest, including the components:

method

Classification method used.

acc

Accuracy rate.

acc.iter

Accuracy rate of each iteration.

acc.std

Standard derivation of accuracy rate.

mar

Prediction margin.

mar.iter

Prediction margin of each iteration.

auc

The area under receiver operating curve (AUC).

auc.iter

AUC of each iteration.

comp

Multiple comparison method used.

h.test

Hypothesis test results of multiple comparison.

gl.pval

Global or overall p-value.

mc.pval

Pairwise comparison p-values.

sampling

Sampling scheme used.

niter

Number of iteration.

nreps

Number of replications in each iteration.

conf.mat

Overall confusion matrix.

acc.boot

A list of bootrap error such as .632 and .632+ if the validation method is bootrap.

Note

The maccest can take any classification model if its argument format is model(formula, data, subset, na.action, ...) and their corresponding method predict.model(object, newdata, ...) can either return the only predicted class label or in a list with name as class, such as lda and pcalda.

As for the multiple comparisons by ANOVA, the following assumptions should be considered:

All the comparisons are based on the results of all iteration.

aam.mcl is a simplified version which returns acc (accuracy), auc(area under ROC curve) and mar(class margin).

Author(s)

Wanchang Lin

See Also

accest, aam.mcl, valipars, plot.maccest trainind, boxplot.maccest,classifier

Examples

# Iris data
data(iris)
x      <- subset(iris, select = -Species)
y      <- iris$Species

method <- c("randomForest","svm","pcalda","knn")
pars   <- valipars(sampling="boot", niter = 3, nreps=5, strat=TRUE)
res    <- maccest(Species~., data = iris, method=method, pars = pars, 
                  comp="anova")
## or 
res    <- maccest(x, y, method=method, pars=pars, comp="anova") 

res
summary(res)
plot(res)
boxplot(res)
oldpar <- par(mar = c(5,10,4,2) + 0.1)
plot(res$h.test$tukey,las=1)   ## plot the tukey results
par(oldpar)

[Package mt version 2.0-1.20 Index]