factorToBinary {Coxmos} | R Documentation |
factorToBinary
Description
Transforms factor variables within a matrix or data frame into binary dummy variables, facilitating numerical representation for subsequent statistical analyses. The function provides an option to generate either k or k-1 dummy variables for each factor, contingent on its levels.
Usage
factorToBinary(X, all = TRUE, sep = "_")
Arguments
X |
Numeric matrix or data.frame. Only qualitative variables (factor class) will be transformed into binary variables. |
all |
Logical. If all = TRUE, as many variables as levels will be returned in the new matrix. Otherwise, k-1 variables will be used where the first level will be use as "default" state (default: TRUE). |
sep |
Character. Character symbol to generate new colnames. Ex. If variable name is "sex" and sep = "_". Dummy variables will be "sex_male" and "sex_female". |
Details
The factorToBinary
function addresses a recurrent challenge in data preprocessing: the
conversion of factor variables into a numerical format suitable for a plethora of statistical and
machine learning algorithms. Factors, inherently categorical in nature, often necessitate
transformation into a binary format, commonly referred to as dummy or one-hot encoding. This
function adeptly performs this transformation, iterating over each column of the provided matrix
or data frame. When encountering factor variables, it employs the model.matrix
function to
generate the requisite dummy variables. The user's discretion is paramount in determining the
number of dummy variables: either k, equivalent to the number of levels for the factor, or k-1,
where the omitted level serves as a reference or "default" state. This choice is particularly
salient in regression contexts to circumvent multicollinearity issues. The naming convention for
the resultant dummy variables amalgamates the original factor's name with its respective level,
separated by a user-defined character, ensuring clarity and interpretability. Non-factor variables
remain unaltered, preserving the integrity of the original data structure.
Value
A matrix or data.frame with k-1 or k dummy variables for categorical/factor data.
Author(s)
Pedro Salguero Garcia. Maintainer: pedsalga@upv.edu.es
Examples
data("X_proteomic")
X <- X_proteomic
X.dummy <- factorToBinary(X, all = FALSE, sep = "_")
X.pls <- factorToBinary(X, all = TRUE, sep = "_")