naref {gamlr}R Documentation

NA reference level

Description

Set NA as the reference level for factor variables and do imputation on missing values for numeric variables. This is useful to build model matrices for regularized regression, and for dealing with missing values, as in Taddy 2019.

Usage

naref(x, impute=FALSE, pzero=0.5)

Arguments

x

A data frame.

impute

Logical, whether to impute missing values in numeric columns.

pzero

If impute==TRUE, then if more than pzero of the values in a column are zero do zero imputation, else do mean imputation.

Details

For every factor or character column in x, naref sets NA as the reference level for a factor variable. Columns coded as character class are first converted to factors via Rfactor(x). If impute=TRUE then the numeric columns are converted to two columns, one appended .x that contains imputed values and another appended .miss which is a binary variable indicating whether the original value was missing. Numeric columns are returned without change if impute=FALSE or if they do not contain any missing values.

Value

A data frame where the factor and character columns have been converted to factors with reference level NA, and if impute=TRUE the missing values in numeric columns have been imputed and a flag for missingness has been added. See details.

Author(s)

Matt Taddy mataddy@gmail.com

References

Matt Taddy, 2019. "Business Data Science", McGraw-Hill

Examples

( x <- data.frame(a=factor(c(1,2,3)),b=c(1,NA,3)) )
naref(x, impute=TRUE)

[Package gamlr version 1.13-8 Index]