R: Wrapper Function for Classifiers

classifier {mt}

R Documentation

Wrapper Function for Classifiers

Description

Wrapper function for classifiers. The classification model is built up on the training data and error estimation is performed on the test data.

Usage

classifier(dat.tr, cl.tr, dat.te=NULL, cl.te=NULL, method,
           pred.func=predict,...)

Arguments

`dat.tr`	A data frame or matrix of training data. The classification model are built on it.
`cl.tr`	A factor or vector of training class.
`dat.te`	A data frame or matrix of test data. Error rates are calculated on this data set.
`cl.te`	A factor or vector of test class.
`method`	Classification method to be used. Any classification methods can be employed if they have method `predict` (except `knn`) with output of predicted class label or one component with name of `class` in the returned list, such as `randomForest`, `svm`, `knn` and `lda`. Either a function or a character string naming the function to be called
`pred.func`	Predict method (default is `predict`). Either a function or a character string naming the function to be called.
`...`	Additional parameters to `method`.

Value

A list including components:

`err`	Error rate of test data.
`cl`	The original class of test data.
`pred`	The predicted class of test data.
`posterior`	Posterior probabilities for the classes if `method` provides posterior output.
`acc`	Accuracy rate of classification.
`margin`	The margin of predictions from classifier `method` if it provides posterior output. The margin of a data point is defined as the proportion of probability for the correct class minus maximum proportion of probabilities for the other classes. Positive margin means correct classification, and vice versa. This idea come from package randomForest. For more details, see `margin`.
`auc`	The area under receiver operating curve (AUC) if classifier `method` produces posterior probabilities and the classification is for two-class problem.

Note

The definition of margin is based on the posterior probabilities. Classifiers, such as randomForest, svm, lda, qda, pcalda and plslda, do output posterior probabilities. But knn does not.

Author(s)

Wanchang Lin

Examples

data(abr1)
dat <- preproc(abr1$pos[,110:500], method="log10")  
cls <- factor(abr1$fact$class)        

## tmp <- dat.sel(dat, cls, choices=c("1","2"))
## dat <- tmp[[1]]$dat
## cls <- tmp[[1]]$cls

idx <- sample(1:nrow(dat), round((2/3)*nrow(dat)), replace = FALSE) 
## constrcuct train and test data 
train.dat  <- dat[idx,]
train.cl   <- cls[idx]
test.dat   <- dat[-idx,]       
test.cl    <- cls[-idx] 

## estimates accuracy
res <- classifier(train.dat, train.cl, test.dat, test.cl, 
                  method="randomForest")
res
## get confusion matrix
cl.rate(obs=res$cl, res$pred)   ## same as: cl.rate(obs=test.cl, res$pred)

## Measurements of Forensic Glass Fragments
data(fgl, package = "MASS")    # in MASS package
dat <- subset(fgl, grepl("WinF|WinNF",type))
## dat <- subset(fgl, type %in% c("WinF", "WinNF"))
x   <- subset(dat, select = -type)
y   <- factor(dat$type)

## construct train and test data 
idx   <- sample(1:nrow(x), round((2/3)*nrow(x)), replace = FALSE) 
tr.x  <- x[idx,]
tr.y  <- y[idx]
te.x  <- x[-idx,]        
te.y  <- y[-idx] 

res.1 <- classifier(tr.x, tr.y, te.x, te.y, method="svm")
res.1
cl.rate(obs=res.1$cl, res.1$pred) 

## classification performance for the two-class case.
pos <- "WinF"                              ## select positive level
cl.perf(obs=res.1$cl, pre=res.1$pred, pos=pos)
## ROC and AUC
cl.roc(stat=res.1$posterior[,pos],label=res.1$cl, pos=pos)

[Package mt version 2.0-1.20 Index]