naref {gamlr} | R Documentation |
NA reference level
Description
Set NA as the reference level for factor variables and do imputation on missing values for numeric variables. This is useful to build model matrices for regularized regression, and for dealing with missing values, as in Taddy 2019.
Usage
naref(x, impute=FALSE, pzero=0.5)
Arguments
x |
A data frame. |
impute |
Logical, whether to impute missing values in numeric columns. |
pzero |
If |
Details
For every factor
or character
column in x
, naref
sets NA
as the reference level for a factor
variable. Columns coded as character
class are first converted to factors via Rfactor(x). If impute=TRUE
then the numeric columns are converted to two columns, one appended .x
that contains imputed values and another appended .miss
which is a binary variable indicating whether the original value was missing. Numeric columns are returned without change if impute=FALSE
or if they do not contain any missing values.
Value
A data frame where the factor and character columns have been converted to factors with reference level NA
, and if impute=TRUE
the missing values in numeric columns have been imputed and a flag for missingness has been added. See details.
Author(s)
Matt Taddy mataddy@gmail.com
References
Matt Taddy, 2019. "Business Data Science", McGraw-Hill
Examples
( x <- data.frame(a=factor(c(1,2,3)),b=c(1,NA,3)) )
naref(x, impute=TRUE)