identify_zero_variance_predictors {collinear}R Documentation

Identify zero and near-zero-variance predictors

Description

Predictors a variance of zero or near zero are highly problematic for multicollinearity analysis and modelling in general. This function identifies these predictors with a level of sensitivity defined by the 'decimals' argument. Smaller number of decimals increase the number of variables detected as near zero variance. Recommended values will depend on the range of the numeric variables in 'df'.

Usage

identify_zero_variance_predictors(df = NULL, predictors = NULL, decimals = 4)

Arguments

df

(required; data frame) A data frame with numeric and/or character predictors predictors, and optionally, a response variable. Default: NULL.

predictors

(optional; character vector) A vector with predictor names in 'df'. If omitted, all columns of 'df' are used as predictors. Default:'NULL'

decimals

(required, integer) number of decimal places for the zero variance test. Default: 4

Value

character vector with names of zero and near-zero variance columns.

Author(s)

Blas M. Benito

Examples


data(
  vi,
  vi_predictors
)

#create zero variance predictors
vi$zv_1 <- 1
vi$zv_2 <- runif(n = nrow(vi), min = 0, max = 0.0001)


#add to vi predictors
vi_predictors <- c(
  vi_predictors,
  "zv_1",
  "zv_2"
)

#identify zero variance predictors
zero.variance.predictors <- identify_zero_variance_predictors(
  df = vi,
  predictors = vi_predictors
)

zero.variance.predictors


[Package collinear version 1.1.1 Index]