oversampleDatRel {imbalanceDatRel}R Documentation

Oversampling and Data Relocation for Resampled Data

Description

oversampleDatRel first oversamples using selected method then relocates resampled data using Pure and Proper Class Cover Catch Digraph.

Usage

oversampleDatRel(
  x,
  y,
  method = "SMOTE",
  proportion = 1,
  p_of = 0,
  class_pos = NULL,
  ...
)

Arguments

x

feature matrix or dataframe.

y

class factor variable.

method

oversampling method. Default is "SMOTE". Available methods are:
"ADASYN": Adaptive Synthetic Sampling
"ROS": Random Oversampling
"ROSE": Randomly Over Sampling Examples
"RSLSMOTE": Relocating safe-level SMOTE with minority outcast handling
"RUS": Random Undersampling
"SLSSMOTE": Safe-level Synthetic Minority Oversampling Technique
"SMOTE": Synthetic Minority Oversampling Technique
"SMOTEWB": SMOTE with boosting

proportion

proportion of covered samples. A real number between (0,1]. 1 by default. Smaller numbers results in less dominant samples.

p_of

proportion to increase cover radius. A real number between (0,\infty). Default is 0. Higher values tolerate other classes more.

class_pos

Class name of synthetic data. Default is NULL. If NULL, positive class is minority class.

...

arguments to be used in specified method.

Details

Oversampling using DatRel. Available oversampling methods are from SMOTEWB package. "ROSE" generates samples from all classes. DatRel relocates all class samples.

Value

an list which includes:

x_new

dominant sample indexes.

y_new

dominant samples from feature matrix, x

x_syn

Radiuses of the circle for dominant samples

i_dominant

class names

x_pos_dominant

number of classes

radii_pos_dominant

proportions each class covered

Author(s)

Fatih Saglam, saglamf89@gmail.com

Examples


library(SMOTEWB)
library(rcccd)

set.seed(10)
# adding data
x <- rbind(matrix(rnorm(2000, 3, 1), ncol = 2, nrow = 1000),
           matrix(rnorm(60, 6, 1), ncol = 2, nrow = 30))
y <- as.factor(c(rep("negative", 1000), rep("positive", 30)))

# adding noise
x[1001,] <- c(3,3)
x[1002,] <- c(2,2)
x[1003,] <- c(4,4)

# resampling
m_SMOTE <- SMOTE(x = x, y = y, k = 3)

# resampled data
plot(x, col = y, main = "SMOTE")
points(m_SMOTE$x_syn, col = "green")

m_DatRel <- oversampleDatRel(x = x, y = y, method = "SMOTE")

# resampled data after relocation
plot(x, col = y, main = "SMOTE + DatRel")
points(m_DatRel$x_syn, col = "green")


[Package imbalanceDatRel version 0.1.5 Index]