sampling_target {alookr} | R Documentation |
Extract the data to fit the model
Description
To solve the imbalanced class, perform sampling in the train set of split_df.
Usage
sampling_target(
.data,
method = c("ubUnder", "ubOver", "ubSMOTE"),
seed = NULL,
perc = 50,
k = ifelse(method == "ubSMOTE", 5, 0),
perc.over = 200,
perc.under = 200
)
Arguments
.data |
an object of class "split_df", usually, a result of a call to split_df(). |
method |
character. sampling methods. "ubUnder" is under-sampling, and "ubOver" is over-sampling, "ubSMOTE" is SMOTE(Synthetic Minority Over-sampling TEchnique). |
seed |
integer. random seed used for sampling |
perc |
integer. The percentage of positive class in the final dataset. It is used only in under-sampling. The default is 50. perc can not exceed 50. |
k |
integer. It is used only in over-sampling and SMOTE. If over-sampling and if K=0: sample with replacement from the minority class until we have the same number of instances in each class. under-sampling and if K>0: sample with replacement from the minority class until we have k-times the original number of minority instances. If SMOTE, the number of neighbours to consider as the pool from where the new examples are generated |
perc.over |
integer. It is used only in SMOTE. per.over/100 is the number of new instances generated for each rare instance. If perc.over < 100 a single instance is generated. |
perc.under |
integer. It is used only in SMOTE. perc.under/100 is the number of "normal" (majority class) instances that are randomly selected for each smoted observation. |
Details
In order to solve the problem of imbalanced class, sampling is performed by under sampling, over sampling, SMOTE method.
Value
An object of train_df.
attributes of train_df class
The attributes of the train_df class are as follows.:
sample_seed : integer. random seed used for sampling
method : character. sampling methods.
perc : integer. perc argument value
k : integer. k argument value
perc.over : integer. perc.over argument value
perc.under : integer. perc.under argument value
binary : logical. whether the target variable is a binary class
target : character. target variable name
minority : character. the level of the minority class
majority : character. the level of the majority class
Examples
library(dplyr)
# Credit Card Default Data
head(ISLR::Default)
# Generate data for the example
sb <- ISLR::Default %>%
split_by(default)
# under-sampling with random seed
under <- sb %>%
sampling_target(seed = 1234L)
under %>%
count(default)
# under-sampling with random seed, and minority class frequency is 40%
under40 <- sb %>%
sampling_target(seed = 1234L, perc = 40)
under40 %>%
count(default)
# over-sampling with random seed
over <- sb %>%
sampling_target(method = "ubOver", seed = 1234L)
over %>%
count(default)
# over-sampling with random seed, and k = 10
over10 <- sb %>%
sampling_target(method = "ubOver", seed = 1234L, k = 10)
over10 %>%
count(default)
# SMOTE with random seed
smote <- sb %>%
sampling_target(method = "ubSMOTE", seed = 1234L)
smote %>%
count(default)
# SMOTE with random seed, and perc.under = 250
smote250 <- sb %>%
sampling_target(method = "ubSMOTE", seed = 1234L, perc.under = 250)
smote250 %>%
count(default)