| balance_data {lares} | R Documentation |
Balance Binary Data by Resampling: Under-Over Sampling
Description
This function lets the user balance a given data.frame by resampling with a given relation rate and a binary feature.
Usage
balance_data(df, var, rate = 1, target = "auto", seed = 0, quiet = FALSE)
Arguments
df |
Vector or Dataframe. Contains different variables in each column, separated by a specific character |
var |
Variable. Which variable should we used to re-sample dataset? |
rate |
Numeric. How many X for every Y we need? Default: 1. If there are more than 2 unique values, rate will represent percentage for number of rows |
target |
Character. If binary, which value should be reduced? If kept in
|
seed |
Numeric. Seed to replicate and obtain same values |
quiet |
Boolean. Keep quiet? If not, messages will be printed |
Value
data.frame. Reduced sampled data.frame following the rate of
appearance of a specific variable.
See Also
Other Data Wrangling:
categ_reducer(),
cleanText(),
date_cuts(),
date_feats(),
file_name(),
formatHTML(),
holidays(),
impute(),
left(),
normalize(),
num_abbr(),
ohe_commas(),
ohse(),
quants(),
removenacols(),
replaceall(),
replacefactor(),
textFeats(),
textTokenizer(),
vector2text(),
year_month(),
zerovar()
Examples
data(dft) # Titanic dataset
df <- balance_data(dft, Survived, rate = 0.5)
df <- balance_data(dft, .data$Survived, rate = 0.1, target = "TRUE")