factorToBinary {Coxmos}R Documentation

factorToBinary

Description

Transforms factor variables within a matrix or data frame into binary dummy variables, facilitating numerical representation for subsequent statistical analyses. The function provides an option to generate either k or k-1 dummy variables for each factor, contingent on its levels.

Usage

factorToBinary(X, all = TRUE, sep = "_")

Arguments

X

Numeric matrix or data.frame. Only qualitative variables (factor class) will be transformed into binary variables.

all

Logical. If all = TRUE, as many variables as levels will be returned in the new matrix. Otherwise, k-1 variables will be used where the first level will be use as "default" state (default: TRUE).

sep

Character. Character symbol to generate new colnames. Ex. If variable name is "sex" and sep = "_". Dummy variables will be "sex_male" and "sex_female".

Details

The factorToBinary function addresses a recurrent challenge in data preprocessing: the conversion of factor variables into a numerical format suitable for a plethora of statistical and machine learning algorithms. Factors, inherently categorical in nature, often necessitate transformation into a binary format, commonly referred to as dummy or one-hot encoding. This function adeptly performs this transformation, iterating over each column of the provided matrix or data frame. When encountering factor variables, it employs the model.matrix function to generate the requisite dummy variables. The user's discretion is paramount in determining the number of dummy variables: either k, equivalent to the number of levels for the factor, or k-1, where the omitted level serves as a reference or "default" state. This choice is particularly salient in regression contexts to circumvent multicollinearity issues. The naming convention for the resultant dummy variables amalgamates the original factor's name with its respective level, separated by a user-defined character, ensuring clarity and interpretability. Non-factor variables remain unaltered, preserving the integrity of the original data structure.

Value

A matrix or data.frame with k-1 or k dummy variables for categorical/factor data.

Author(s)

Pedro Salguero Garcia. Maintainer: pedsalga@upv.edu.es

Examples

data("X_proteomic")
X <- X_proteomic
X.dummy <- factorToBinary(X, all = FALSE, sep = "_")
X.pls <- factorToBinary(X, all = TRUE, sep = "_")

[Package Coxmos version 1.0.2 Index]