ROSE {SMOTEWB} | R Documentation |
Randomly Over Sampling Examples
Description
Generates synthetic data for each class to balance imbalanced datasets using kernel density estimations. Can be used for multiclass datasets.
Usage
ROSE(x, y, h = 1)
Arguments
x |
feature matrix or data.frame. |
y |
a factor class variable. Can be multiclass. |
h |
A numeric vector of length one or number of classes in y. If one is
given, all classes will have same shrink factor. If a value is given for each
classes, it will match respectively to |
Details
Randomly Over Sampling Examples (ROSE) (Menardi and Torelli, 2014) is an oversampling method which uses conditional kernel densities to balance dataset. There is already an R package called 'ROSE' (Lunardon et al., 2014). Difference is that this one is much faster and can be applied for more than two classes.
Value
a list with resampled dataset.
x_new |
Resampled feature matrix. |
y_new |
Resampled target variable. |
Author(s)
Fatih Saglam, saglamf89@gmail.com
References
Lunardon, N., Menardi, G., and Torelli, N. (2014). ROSE: a Package for Binary Imbalanced Learning. R Jorunal, 6:82–92.
Menardi, G. and Torelli, N. (2014). Training and assessing classification rules with imbalanced data. Data Mining and Knowledge Discovery, 28:92–122.
Examples
set.seed(1)
x <- rbind(matrix(rnorm(2000, 3, 1), ncol = 2, nrow = 1000),
matrix(rnorm(100, 5, 1), ncol = 2, nrow = 50))
y <- as.factor(c(rep("negative", 1000), rep("positive", 50)))
plot(x, col = y)
# resampling
m <- ROSE(x = x, y = y, h = c(0.12, 1))
plot(m$x_new, col = m$y_new)