smoothMean {superml} | R Documentation |
smoothMean Calculator
Description
Calculates target encodings using a smoothing parameter and count of categorical variables. This approach is more robust to possibility of leakage and avoid overfitting.
Usage
smoothMean(
train_df,
test_df,
colname,
target,
min_samples_leaf = 1,
smoothing = 1,
noise_level = 0
)
Arguments
train_df |
train dataset |
test_df |
test dataset |
colname |
name of categorical column |
target |
name of target column |
min_samples_leaf |
minimum samples to take category average into account |
smoothing |
smoothing effect to balance categorical average vs prior |
noise_level |
random noise to add, optional |
Value
a train and test data table with mean encodings of the target for the given categorical variable
Examples
train <- data.frame(region=c('del','csk','rcb','del','csk','pune','guj','del'),
win = c(0,1,1,0,0,1,0,1))
test <- data.frame(region=c('rcb','csk','rcb','del','guj','pune','csk','kol'))
# calculate encodings
all_means <- smoothMean(train_df = train,
test_df = test,
colname = 'region',
target = 'win')
train_mean <- all_means$train
test_mean <- all_means$test
[Package superml version 0.5.7 Index]