R: Safe-level Synthetic Minority Oversampling Technique

SLSMOTE {SMOTEWB}

R Documentation

Safe-level Synthetic Minority Oversampling Technique

Description

SLSMOTE() generates synthetic samples by considering a safe level of the nearest minority class examples.

Usage

SLSMOTE(x, y, k1 = 5, k2 = 5)

Arguments

`x`	feature matrix or data.frame.
`y`	a factor class variable with two classes.
`k1`	number of neighbors to link. Default is 5.
`k2`	number of neighbors to determine safe levels. Default is 5.

Details

SLSMOTE uses the safe-level distance metric to identify the minority class samples that are safe to oversample. Safe-level distance measures the distance between a minority class sample and its k-nearest minority class neighbors. A sample is considered safe to oversample if its safe-level is greater than a threshold. The safe-level of a sample is the ratio of minority class samples among its k-nearest neighbors.

In SLSMOTE, the oversampling process only applies to the safe minority class samples, which avoids the generation of noisy samples that can lead to overfitting. To generate synthetic samples, SLSMOTE randomly selects a minority class sample and finds its k-nearest minority class neighbors. Then, a random minority class neighbor is selected, and a synthetic sample is generated by adding a random proportion of the difference between the selected sample and its neighbor to the selected sample.

Note: Much faster than smotefamily::SLS().

Value

a list with resampled dataset.

`x_new`	Resampled feature matrix.
`y_new`	Resampled target variable.
`x_syn`	Generated synthetic data.
`C`	Number of synthetic samples for each positive class samples.

Author(s)

Fatih Saglam, saglamf89@gmail.com

References

Bunkhumpornpat, C., Sinapiromsaran, K., & Lursinsap, C. (2009). Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In Advances in Knowledge Discovery and Data Mining: 13th Pacific-Asia Conference, PAKDD 2009 Bangkok, Thailand, April 27-30, 2009 Proceedings 13 (pp. 475-482). Springer Berlin Heidelberg.

Examples


set.seed(1)
x <- rbind(matrix(rnorm(2000, 3, 1), ncol = 2, nrow = 1000),
           matrix(rnorm(100, 5, 1), ncol = 2, nrow = 50))
y <- as.factor(c(rep("negative", 1000), rep("positive", 50)))

plot(x, col = y)

# resampling
m <- SLSMOTE(x = x, y = y, k1 = 5, k2 = 5)

plot(m$x_new, col = m$y_new)

[Package SMOTEWB version 1.2.0 Index]