hotdeck {VIM} | R Documentation |
Hot-Deck Imputation
Description
Implementation of the popular Sequential, Random (within a domain) hot-deck algorithm for imputation.
Usage
hotdeck(
data,
variable = NULL,
ord_var = NULL,
domain_var = NULL,
makeNA = NULL,
NAcond = NULL,
impNA = TRUE,
donorcond = NULL,
imp_var = TRUE,
imp_suffix = "imp"
)
Arguments
data |
data.frame or matrix |
variable |
variables where missing values should be imputed (not overlapping with ord_var) |
ord_var |
variables for sorting the data set before imputation (not overlapping with variable) |
domain_var |
variables for building domains and impute within these domains |
makeNA |
list of length equal to the number of variables, with values, that should be converted to NA for each variable |
NAcond |
list of length equal to the number of variables, with a condition for imputing a NA |
impNA |
TRUE/FALSE whether NA should be imputed |
donorcond |
list of length equal to the number of variables, with a donorcond condition as character string. e.g. ">5" or c(">5","<10). If the list element for a variable is NULL no condition will be applied for this variable. |
imp_var |
TRUE/FALSE if a TRUE/FALSE variables for each imputed variable should be created show the imputation status |
imp_suffix |
suffix for the TRUE/FALSE variables showing the imputation status |
Value
the imputed data set.
Note
If the sequential hotdeck does not lead to a suitable, a random donor in the group will be used.
Author(s)
Alexander Kowarik
References
A. Kowarik, M. Templ (2016) Imputation with R package VIM. Journal of Statistical Software, 74(7), 1-16.
See Also
Other imputation methods:
impPCA()
,
irmi()
,
kNN()
,
matchImpute()
,
medianSamp()
,
rangerImpute()
,
regressionImp()
,
sampleCat()
Examples
data(sleep)
sleepI <- hotdeck(sleep)
sleepI2 <- hotdeck(sleep,ord_var="BodyWgt",domain_var="Pred")
# Usage of donorcond in a simple example
sleepI3 <- hotdeck(
sleep,
variable = c("NonD", "Dream", "Sleep", "Span", "Gest"),
ord_var = "BodyWgt", domain_var = "Pred",
donorcond = list(">4", "<17", ">1.5", "%between%c(8,13)", ">5")
)
set.seed(132)
nRows <- 1e3
# Generate a data set with nRows rows and several variables
x <- data.frame(
x = rnorm(nRows), y = rnorm(nRows),
z = sample(LETTERS, nRows, replace = TRUE),
d1 = sample(LETTERS[1:3], nRows, replace = TRUE),
d2 = sample(LETTERS[1:2], nRows, replace = TRUE),
o1 = rnorm(nRows), o2 = rnorm(nRows), o3 = rnorm(100)
)
origX <- x
x[sample(1:nRows,nRows/10), 1] <- NA
x[sample(1:nRows,nRows/10), 2] <- NA
x[sample(1:nRows,nRows/10), 3] <- NA
x[sample(1:nRows,nRows/10), 4] <- NA
xImp <- hotdeck(x,ord_var = c("o1", "o2", "o3"), domain_var = "d2")