DatRel {imbalanceDatRel} | R Documentation |
Data Relocation for Resampled Data using Pure and Proper Class Cover Catch Digraph
Description
DatRel
relocates resampled data using Pure and Proper Class Cover Catch Digraph
Usage
DatRel(x, y, x_syn, proportion = 1, p_of = 0, class_pos = NULL)
Arguments
x |
feature matrix or dataframe. |
y |
class factor variable. |
x_syn |
synthetic data generated by an oversampling method. |
proportion |
proportion of covered samples. A real number between |
p_of |
proportion to increase cover radius. A real number between
|
class_pos |
Class name of synthetic data. Default is NULL. If NULL, positive class is minority class. |
Details
Calculates cover areas using pure and proper class cover catch digraphs (PCCCD) for
original dataset. Any sample outside of cover area is relocated towards a
specific dominant point. Determination of dominant point to move towards is
based on distance based on radii of PCCCD balls. p_of
is to increase
obtained radii to be more tolerant to noise. prooportion
argument is
cover percentage for PCCCD to stop when desired percentage is covered for
each class. PCCCD models are determined using rcccd
package.
class_pos
argument is used to specify oversampled class.
Value
an list object which includes:
x_new |
Oversampled and relocated feature matrix |
y_new |
Oversampled class variable |
x_syn |
Generated and relocated sample matrix |
i_dominant |
Indexes of dominant samples |
x_pos_dominant |
Dominant samples for positive class |
radii_pos_dominant |
Positive class cover percentage |
Author(s)
Fatih Saglam, saglamf89@gmail.com
Examples
library(SMOTEWB)
library(rcccd)
set.seed(10)
# adding data
x <- rbind(matrix(rnorm(2000, 3, 1), ncol = 2, nrow = 1000),
matrix(rnorm(60, 6, 1), ncol = 2, nrow = 30))
y <- as.factor(c(rep("negative", 1000), rep("positive", 30)))
# adding noise
x[1001,] <- c(3,3)
x[1002,] <- c(2,2)
x[1003,] <- c(4,4)
# resampling
m_SMOTE <- SMOTE(x = x, y = y, k = 3)
# relocation of resampled data
m_DatRel <- DatRel(x = x, y = y, x_syn = m_SMOTE$x_syn)
# resampled data
plot(x, col = y, main = "SMOTE")
points(m_SMOTE$x_syn, col = "green")
# resampled data after relocation
plot(x, col = y, main = "SMOTE + DatRel")
points(m_DatRel$x_syn, col = "green")