R: Estimate Classification Accuracy By Resampling Method

accest {mt}

R Documentation

Estimate Classification Accuracy By Resampling Method

Description

Estimate classification accuracy rate by resampling method.

Usage

accest(dat, ...)

## Default S3 method:
accest(dat, cl, method, pred.func=predict,pars = valipars(), 
       tr.idx = NULL, ...) 

## S3 method for class 'formula'
accest(formula, data = NULL, ..., subset, na.action = na.omit)

aam.cl(x,y,method, pars = valipars(),...)

aam.mcl(x,y,method, pars = valipars(),...)

Arguments

`formula`	A formula of the form `groups ~ x1 + x2 + ...` That is, the response is the grouping factor and the right hand side specifies the (non-factor) discriminators.
`data`	Data frame from which variables specified in `formula` are preferentially to be taken.
`dat`, `x`	A matrix or data frame containing the explanatory variables if no formula is given as the principal argument.
`cl`, `y`	A factor specifying the class for each observation if no formula principal argument is given.
`method`	Classification method whose accuracy rate is to be estimated, such as `randomForest`, `svm`, `knn` and `lda`. For details, see `note` below. Either a function or a character string naming the function to be called.
`pred.func`	Predict method (default is `predict`). Either a function or a character string naming the function to be called.
`pars`	A list of parameters using by the resampling method such as Leave-one-out cross-validation, Cross-validation, Bootstrap and Randomised validation (holdout). See `valipars` for details.
`tr.idx`	User defined index of training samples. Can be generated by `trainind`.
`...`	Additional parameters to `method`.
`subset`	Optional vector, specifying a subset of observations to be used.
`na.action`	Function which indicates what should happen when the data contains `NA`'s, defaults to `na.omit`.

Details

The accuracy rates of classification are estimated by techniques such as Random Forest, Support Vector Machine, k-Nearest Neighbour Classification and Linear Discriminant Analysis based on resampling methods, including Leave-one-out cross-validation, Cross-validation, Bootstrap and Randomised validation (holdout).

Value

accest returns an object including the components:

`method`	Classification method used.
`acc`	Overall accuracy rate.
`acc.iter`	Average accuracy rate for each iteration.
`acc.all`	Accuracy rate for each iteration and replication.
`auc`	Overall area under receiver operating curve (AUC).
`auc.iter`	Average AUC for each iteration.
`auc.all`	AUC for each iteration and replication.
`mar`	Overall prediction margin.
`mar.iter`	Average prediction margin for each iteration.
`mar.all`	Prediction margin for each iteration and replication.
`err`	Overall error rate.
`err.iter`	Average error rate for each iteration.
`err.all`	Error rate for each iteration and replication.
`sampling`	Sampling scheme used.
`niter`	Number of iteration.
`nreps`	Number of replications in each iteration if resampling is not `loocv`.
`conf`	Overall confusion matrix.
`res.all`	All results which can be further processed.
`acc.boot`	A list of bootstrap accuracy such as `.632` and `.632+` if the resampling method is bootstrap.

aam.cl returns a vector with acc (accuracy), auc(area under ROC curve) and mar(class margin).

aam.mcl returns a matrix with columns of acc (accuracy), auc(area under ROC curve) and mar(class margin).

Note

The accest can take any classification models if their argument format is model(formula, data, subset, na.action, ...) and their corresponding method predict.model(object, newdata, ...) can either return the only predicted class label or a list with a component called class, such as lda and pcalda.

If classifier method provides posterior probabilities, the prediction margin mar will be generated, otherwise NULL.

If classifier method provides posterior probabilities and the classification is for two-class problem, auc will be generated, otherwise NULL.

aam.cl is a wrapper function of accest, returning accuracy rate, AUC and classification margin. aam.mcl accepts multiple classifiers in one running.

Author(s)

Wanchang Lin

Examples

# Iris data
data(iris)
# Use KNN classifier and bootstrap for resampling
acc <- accest(Species~., data = iris, method = "knn",
              pars = valipars(sampling = "boot",niter = 2, nreps=5))
acc
summary(acc)
acc$acc.boot

# alternatively the traditional interface:
x <- subset(iris, select = -Species)
y <- iris$Species

## -----------------------------------------------------------------------
# Random Forest with 5-fold stratified cv 
pars   <- valipars(sampling = "cv",niter = 4, nreps=5, strat=TRUE)
tr.idx <- trainind(y,pars=pars)
acc1   <- accest(x, y, method = "randomForest", pars = pars, tr.idx=tr.idx)
acc1
summary(acc1)
# plot the accuracy in each iteration
plot(acc1)

## -----------------------------------------------------------------------
# Forensic Glass data in chap.12 of MASS
data(fgl, package = "MASS")    # in MASS package
# Randomised validation (holdout) of SVM for fgl data
acc2 <- accest(type~., data = fgl, method = "svm", cost = 100, gamma = 1, 
              pars = valipars(sampling = "rand",niter = 10, nreps=4,div = 2/3) )
              
acc2
summary(acc2)
# plot the accuracy in each iteration
plot(acc2)

## -----------------------------------------------------------------------
## Examples of amm.cl and aam.mcl
aam.1 <- aam.cl(x,y,method="svm",pars=pars)
aam.2 <- aam.mcl(x,y,method=c("svm","randomForest"),pars=pars)

## If use two classes, AUC will be calculated
idx <- (y == "setosa")
aam.3 <- aam.cl(x[!idx,],factor(y[!idx]),method="svm",pars=pars)
aam.4 <- aam.mcl(x[!idx,],factor(y[!idx]),method=c("svm","randomForest"),pars=pars)

[Package mt version 2.0-1.20 Index]