build_encoding {dataPreparation}R Documentation

Compute encoding

Description

Build a list of one hot encoding for each cols.

Usage

build_encoding(data_set, cols = "auto", verbose = TRUE, min_frequency = 0, ...)

Arguments

data_set

Matrix, data.frame or data.table

cols

List of numeric column(s) name(s) of data_set to transform. To transform all characters, set it to "auto". (character, default to "auto")

verbose

Should the algorithm talk? (Logical, default to TRUE)

min_frequency

The minimal share of lines that a category should represent (numeric, between 0 and 1, default to 0)

...

Other arguments such as name_separator to separate words in new columns names (character, default to ".")

Details

To avoid creating really large sparce matrices, one can use param min_frequency to be sure that only most representative values will be used to create a new column (and not out-layers or mistakes in data).
Setting min_frequency to something greater than 0 may cause the function to be slower (especially for large data_set).

Value

A list where each element name is a column name of data set and each element new_cols and values the new columns that will be built during encoding.

Examples

# Get a data set
data(adult)
encoding <- build_encoding(adult, cols = "auto", verbose = TRUE)

print(encoding)

# To limit the number of generated columns, one can use min_frequency parameter:
build_encoding(adult, cols = "auto", verbose = TRUE, min_frequency = 0.1)
# Set to 0.1, it will create columns only for values that are present 10% of the time.

[Package dataPreparation version 1.1.1 Index]