R: Estimation of Multiple Classification Accuracy

maccest {mt}

R Documentation

Estimation of Multiple Classification Accuracy

Description

Estimation of classification accuracy by multiple classifiers with resampling procedure and comparisons of multiple classifiers.

Usage

maccest(dat, ...)
## Default S3 method:
maccest(dat, cl, method="svm", pars = valipars(), 
        tr.idx = NULL, comp="anova",...) 
## S3 method for class 'formula'
maccest(formula, data = NULL, ..., subset, na.action = na.omit)

Arguments

`formula`	A formula of the form `groups ~ x1 + x2 + ...` That is, the response is the grouping factor and the right hand side specifies the (non-factor) discriminators.
`data`	Data frame from which variables specified in `formula` are preferentially to be taken.
`dat`	A matrix or data frame containing the explanatory variables if no formula is given as the principal argument.
`cl`	A factor specifying the class for each observation if no formula principal argument is given.
`method`	A vector of multiple classification methods to be used. Classifiers, such as `randomForest`, `svm`, `knn` and `lda`, can be used. For details, see `note` below.
`pars`	A list of resampling scheme such as Leave-one-out cross-validation, Cross-validation, Randomised validation (holdout) and Bootstrap, and control parameters for the calculation of accuracy. See `valipars` for details.
`tr.idx`	User defined index of training samples. Can be generated by `trainind`.
`comp`	Comparison method of multiple classifier. If `comp` is `anova`, the multiple comparisons are performed by `ANOVA` and then the pairwise comparisons are performed by `HSDTukey`. If `comp` is `fried`, the multiple comparisons are performed by `Friedman Test` and the pairwise comparisons are performed by `Wilcoxon Test`.
`...`	Additional parameters to `method`.
`subset`	Optional vector, specifying a subset of observations to be used.
`na.action`	Function which indicates what should happen when the data contains `NA`'s, defaults to `na.omit`.

Details

The accuracy rates for classification are obtained used techniques such as Random Forest, Support Vector Machine, k-Nearest Neighbour Classification, Linear Discriminant Analysis and Linear Discriminant Analysis based on sampling methods, including Leave-one-out cross-validation, Cross-validation, Randomised validation (holdout) and Bootstrap.

Value

An object of class maccest, including the components:

`method`	Classification method used.
`acc`	Accuracy rate.
`acc.iter`	Accuracy rate of each iteration.
`acc.std`	Standard derivation of accuracy rate.
`mar`	Prediction margin.
`mar.iter`	Prediction margin of each iteration.
`auc`	The area under receiver operating curve (AUC).
`auc.iter`	AUC of each iteration.
`comp`	Multiple comparison method used.
`h.test`	Hypothesis test results of multiple comparison.
`gl.pval`	Global or overall p-value.
`mc.pval`	Pairwise comparison p-values.
`sampling`	Sampling scheme used.
`niter`	Number of iteration.
`nreps`	Number of replications in each iteration.
`conf.mat`	Overall confusion matrix.
`acc.boot`	A list of bootrap error such as `.632` and `.632+` if the validation method is bootrap.

Note

The maccest can take any classification model if its argument format is model(formula, data, subset, na.action, ...) and their corresponding method predict.model(object, newdata, ...) can either return the only predicted class label or in a list with name as class, such as lda and pcalda.

As for the multiple comparisons by ANOVA, the following assumptions should be considered:

The samples are randomly and independently selected.
The populations are normally distributed.
The populations all have the same variance.

All the comparisons are based on the results of all iteration.

aam.mcl is a simplified version which returns acc (accuracy), auc(area under ROC curve) and mar(class margin).

Author(s)

Wanchang Lin

Examples

# Iris data
data(iris)
x      <- subset(iris, select = -Species)
y      <- iris$Species

method <- c("randomForest","svm","pcalda","knn")
pars   <- valipars(sampling="boot", niter = 3, nreps=5, strat=TRUE)
res    <- maccest(Species~., data = iris, method=method, pars = pars, 
                  comp="anova")
## or 
res    <- maccest(x, y, method=method, pars=pars, comp="anova") 

res
summary(res)
plot(res)
boxplot(res)
oldpar <- par(mar = c(5,10,4,2) + 0.1)
plot(res$h.test$tukey,las=1)   ## plot the tukey results
par(oldpar)

[Package mt version 2.0-1.20 Index]