SMOTEWB {SMOTEWB} | R Documentation |
SMOTE with boosting (SMOTEWB)
Description
Resampling with SMOTE with boosting.
Usage
SMOTEWB(x, y, n_weak_classifier = 100, class_weights = NULL, k_max = NULL, ...)
Arguments
x |
feature matrix. |
y |
a factor class variable with two classes. |
n_weak_classifier |
number of weak classifiers for boosting. |
class_weights |
numeric vector of length two. First number is for
positive class, and second is for negative. Higher the relative weight,
lesser noises for that class. By default, |
k_max |
to increase maximum number of neighbors. Default is
|
... |
additional inputs for ada::ada(). |
Details
SMOTEWB (Saglam & Cengiz, 2022) is a SMOTE-based oversampling method which can handle noisy data and adaptively decides the appropriate number of neighbors to link during resampling with SMOTE.
Trained model based on this method gives significantly better Matthew Correlation Coefficient scores compared to others.
Value
a list with resampled dataset.
x_new |
Resampled feature matrix. |
y_new |
Resampled target variable. |
x_syn |
Generated synthetic data. |
w |
Boosting weights for original dataset. |
k |
Number of nearest neighbors for positive class samples. |
C |
Number of synthetic samples for each positive class samples. |
Author(s)
Fatih Saglam, saglamf89@gmail.com
References
Sağlam, F., & Cengiz, M. A. (2022). A novel SMOTE-based resampling technique trough noise detection and the boosting procedure. Expert Systems with Applications, 200, 117023.
Can work with 2 classes only yet.
Examples
set.seed(1)
x <- rbind(matrix(rnorm(2000, 3, 1), ncol = 2, nrow = 1000),
matrix(rnorm(100, 5, 1), ncol = 2, nrow = 50))
y <- as.factor(c(rep("negative", 1000), rep("positive", 50)))
plot(x, col = y)
# resampling
m <- SMOTEWB(x = x, y = y, n_weak_classifier = 150)
plot(m$x_new, col = m$y_new)