build_encoding {dataPreparation} | R Documentation |
Compute encoding
Description
Build a list of one hot encoding for each cols
.
Usage
build_encoding(data_set, cols = "auto", verbose = TRUE, min_frequency = 0, ...)
Arguments
data_set |
Matrix, data.frame or data.table |
cols |
List of numeric column(s) name(s) of data_set to transform. To transform all characters, set it to "auto". (character, default to "auto") |
verbose |
Should the algorithm talk? (Logical, default to TRUE) |
min_frequency |
The minimal share of lines that a category should represent (numeric, between 0 and 1, default to 0) |
... |
Other arguments such as |
Details
To avoid creating really large sparce matrices, one can use param min_frequency
to be
sure that only most representative values will be used to create a new column (and not
out-layers or mistakes in data).
Setting min_frequency
to something greater than 0 may cause the function to be slower
(especially for large data_set).
Value
A list where each element name is a column name of data set and each element new_cols and values the new columns that will be built during encoding.
Examples
# Get a data set
data(adult)
encoding <- build_encoding(adult, cols = "auto", verbose = TRUE)
print(encoding)
# To limit the number of generated columns, one can use min_frequency parameter:
build_encoding(adult, cols = "auto", verbose = TRUE, min_frequency = 0.1)
# Set to 0.1, it will create columns only for values that are present 10% of the time.