R: Estimation of misclassification errors (generalisation...

classificationError {optBiomarker}

R Documentation

Estimation of misclassification errors (generalisation errors) based on statistical and various machine learning methods

Description

Estimates misclassification errors (generalisation errors), sensitivity and specificity using cross-validation, bootstrap and 632plus bias corrected bootstrap methods based on Random Forest, Support Vector Machines, Linear Discriminant Analysis and k-Nearest Neighbour methods.

Usage

## S3 method for class 'data.frame'
classificationError(
          formula,
          data, 
          method=c("RF","SVM","LDA","KNN"), 
          errorType = c("cv", "boot", "six32plus"),
	  senSpec=TRUE,
          negLevLowest=TRUE,
	  na.action=na.omit, 
          control=control.errorest(k=NROW(na.action(data)),nboot=100),
          ...)

Arguments

`formula`	A formula of the form `lhs ~ rhs` relating response (class) variable and the explanatory variables. See `lm` for more detail.
`data`	A data frame containing the response (class membership) variable and the explanatory variables in the formula.
`method`	A character vector of length `1` to `4` representing the classification methods to be used. Can be one or more of `"RF"` (Random Forest), `"SVM"` (Support Vector Machines), `"LDA"` (Linear Discriminant Analysis) and `"KNN"` (k-Nearest Neighbour). Defaults to all four methods.
`errorType`	A character vector of length `1` to `3` representing the type of estimators to be used for computing misclassification errors. Can be one or more of the `"cv"` (cross-validation), `"boot"` (bootstrap) and `"632plus"` (632plus bias corrected bootstrap) estimators. Defaults to all three estimators.
`senSpec`	Logical. Should sensitivity and specificity (for cross-validation estimator only) be computed? Defaults to `TRUE`.
`negLevLowest`	Logical. Is the lowest of the ordered levels of the class variable represnts the negative control? Defaults to `TRUE`.
`na.action`	Function which indicates what should happen when the data contains `NA`'s, defaults to `na.omit`.
`control`	Control parameters of the the function `errorest`.
`...`	additional parameters to `method`.

Details

In the current version of the package, estimation of sensitivity and specificity is limited to cross-validation estimator only. For LDA sample size must be greater than the number of explanatory variables to avoid singularity. The function classificationError does not check if this is satisfied, but the underlying function lda produces warnings if this condition is violated.

Value

Returns an object of class classificationError with components

`call`	The call of the `classificationError` function.
`errorRate`	A `length(errorType)` by `length(method)` matrix of classification errors.
`rocData`	A `2` by `length(method)` matrix of sensitivities (first row) and specificities (second row).

Author(s)

Mizanur Khondoker, Till Bachmann, Peter Ghazal
Maintainer: Mizanur Khondoker mizanur.khondoker@gmail.com.

References

Khondoker, M. R., Till T. Bachmann, T. T., Mewissen, M., Dickinson, P. et al.(2010). Multi-factorial analysis of class prediction error: estimating optimal number of biomarkers for various classification rules. Journal of Bioinformatics and Computational Biology, 8, 945-965.

Breiman, L. (2001). Random Forests, Machine Learning 45(1), 5–32.

Chang, Chih-Chung and Lin, Chih-Jen: LIBSVM: a library for Support Vector Machines, https://www.csie.ntu.edu.tw/~cjlin/libsvm/.

Ripley, B. D. (1996). Pattern Recognition and Neural Networks.Cambridge: Cambridge University Press.

Efron, B. and Tibshirani, R. (1997). Improvements on Cross-Validation: The .632+ Bootstrap Estimator. Journal of the American Statistical Association 92(438), 548–560.

Examples


## Not run: 
mydata<-simData(nTrain=30,nBiom=3)$data
classificationError(formula=class~., data=mydata)

## End(Not run)

[Package optBiomarker version 1.0-28 Index]