nearmiss {themis} | R Documentation |
Remove Points Near Other Classes
Description
Generates synthetic positive instances using nearmiss algorithm.
Usage
nearmiss(df, var, k = 5, under_ratio = 1)
Arguments
df |
data.frame or tibble. Must have 1 factor variable and remaining numeric variables. |
var |
Character, name of variable containing factor variable. |
k |
An integer. Number of nearest neighbor that are used to generate the new examples of the minority class. |
under_ratio |
A numeric value for the ratio of the minority-to-majority frequencies. The default value (1) means that all other levels are sampled down to have the same frequency as the least occurring level. A value of 2 would mean that the majority levels will have (at most) (approximately) twice as many rows than the minority level. |
Details
All columns used in this function must be numeric with no missing data.
Value
A data.frame or tibble, depending on type of df
.
References
Inderjeet Mani and I Zhang. knn approach to unbalanced data distributions: a case study involving information extraction. In Proceedings of workshop on learning from imbalanced datasets, 2003.
See Also
step_nearmiss()
for step function of this method
Other Direct Implementations:
adasyn()
,
bsmote()
,
smotenc()
,
smote()
,
tomek()
Examples
circle_numeric <- circle_example[, c("x", "y", "class")]
res <- nearmiss(circle_numeric, var = "class")
res <- nearmiss(circle_numeric, var = "class", k = 10)
res <- nearmiss(circle_numeric, var = "class", under_ratio = 1.5)