encode_lowrank {categoryEncodings} | R Documentation |
Encode a given factor variable using low rank encoding
Description
Transforms the original design matrix using a low rank encoding.
Usage
encode_lowrank(X, fact, keep_factor = FALSE, encoding_only = FALSE)
Arguments
X |
The data.frame/data.table to transform. |
fact |
The factor variable to encode by - either a positive integer specifying the column number, or the name of the column. |
keep_factor |
Whether to keep the original factor column(defaults to **FALSE**). |
encoding_only |
Whether to return the full transformed dataset or only the new columns. Defaults to FALSE and returns the full dataset. |
Details
Uses the method from Johannemann et al.(2019) 'Sufficient Representations for Categorical Variables' - Low rank.
Value
A new data.table X which contains the new columns and optionally the old factor.
Examples
design_mat <- cbind( data.frame( matrix(rnorm(5*100),ncol = 5) ),
sample( sample(letters, 10), 100, replace = TRUE)
)
colnames(design_mat)[6] <- "factor_var"
encode_lowrank(X = design_mat, fact = "factor_var", keep_factor = FALSE)
[Package categoryEncodings version 1.4.3 Index]