deleteZeroOrNearZeroVariance {Coxmos} | R Documentation |
deleteZeroOrNearZeroVariance
Description
Provides a robust mechanism to filter out variables from a dataset that exhibit zero or near-zero variance, thereby enhancing the quality and interpretability of subsequent statistical analyses.
Usage
deleteZeroOrNearZeroVariance(
X,
remove_near_zero_variance = FALSE,
remove_zero_variance = TRUE,
toKeep.zv = NULL,
freqCut = 95/5
)
Arguments
X |
Numeric matrix or data.frame. Explanatory variables. Qualitative variables must be transform into binary variables. |
remove_near_zero_variance |
Logical. If remove_near_zero_variance = TRUE, near zero variance variables will be removed (default: TRUE). |
remove_zero_variance |
Logical. If remove_zero_variance = TRUE, zero variance variables will be removed (default: TRUE). |
toKeep.zv |
Character vector. Name of variables in X to not be deleted by (near) zero variance filtering (default: NULL). |
freqCut |
Numeric. Cutoff for the ratio of the most common value to the second most common value (default: 95/5). |
Details
The deleteZeroOrNearZeroVariance
function is an indispensable tool in the preprocessing
phase of statistical modeling. In many datasets, especially high-dimensional ones, certain variables
might exhibit zero or near-zero variance. Such variables can be problematic as they offer limited
information variance and can potentially distort the results of statistical models, leading to
issues like overfitting. By leveraging the caret::nearZeroVar()
function, this tool offers a
rigorous method to identify and exclude these variables. Users are afforded flexibility in their
choices, with options to remove only zero variance variables, near-zero variance variables, or
both. The function also provides the capability to set a frequency cutoff, freqCut
, which
determines the threshold for near-zero variance based on the ratio of the most frequent value to
the second most frequent value. For scenarios where certain variables are deemed essential and
should not be removed regardless of their variance, the toKeep.zv
parameter allows users to
specify a list of such variables.
Value
Return a list of two objects:
X
: The new data.frame X filtered.
variablesDeleted
: The variables that have been removed by the filter.
Author(s)
Pedro Salguero Garcia. Maintainer: pedsalga@upv.edu.es
Examples
data("X_proteomic")
X <- X_proteomic
filter <- deleteZeroOrNearZeroVariance(X, remove_near_zero_variance = TRUE)