balance {ECoL} | R Documentation |
Measures of class balance
Description
Classification task. These measures capture the differences in the number of examples per class in the dataset. When these differences are severe, problems related to generalization of the ML classification techniques could happen because of the imbalance ratio.
Usage
balance(...)
## Default S3 method:
balance(x, y, measures = "all", ...)
## S3 method for class 'formula'
balance(formula, data, measures = "all", ...)
Arguments
... |
Not used. |
x |
A data.frame contained only the input attributes. |
y |
A factor response vector with one label for each row/component of x. |
measures |
A list of measures names or |
formula |
A formula to define the class column. |
data |
A data.frame dataset contained the input attributes and class. |
Details
The following measures are allowed for this method:
- "C1"
The entropy of class proportions (C1) capture the imbalance in a dataset based on the proportions of examples per class.
- "C2"
The imbalance ratio (C2) is an index computed for measuring class balance. This is a version of the measure that is also suited for multiclass classification problems.
Value
A list named by the requested class balance measure.
References
Ana C Lorena, Ivan G Costa, Newton Spolaor and Marcilio C P Souto. (2012). Analysis of complexity indices for classification problems: Cancer gene expression data. Neurocomputing 75, 1, 33–42.
Ajay K Tanwani and Muddassar Farooq. (2010). Classification potential vs. classification accuracy: a comprehensive study of evolutionary algorithms with biomedical datasets. Learning Classifier Systems 6471, 127–144.
See Also
Other complexity-measures: correlation
,
dimensionality
, linearity
,
neighborhood
, network
,
overlapping
, smoothness
Examples
## Extract all balance measures for classification task
data(iris)
balance(Species ~ ., iris)