R: Implement hot deck multiple imputation with ordinal...

hd.ord {hot.deck}

R Documentation

Implement hot deck multiple imputation with ordinal variables.

Description

This function adapts the “hot.deck” function to impute data with missing observations by specifically accounting for ordinal variables. The ordinal variable is regressed on specified meaningful explanatory variables with the polr ordered probit approach. The approach assumes an underlying latent continuous variable and estimates the distances between ordinal variable categories. Ordinal levels are replaced with mid-cutpoints of the newly estimated intercepts. Categories that are not supported by the data are dropped. The resulting categories are used to impute the data with multiple hot deck imputation with either the “best cell” method (default) or the “probabilistic draw” method. Any number of ordinal variables can be specified. The specified ordinal variables must not contain missing values.

Usage

hd.ord(data, ord, evs, m = 5, method=c("best.cell", "p.draw"),
cutoff=10, sdCutoff=1, optimizeSD = FALSE, optimStep = 0.1, optimStop = 5,
weightedAffinity = FALSE, impContinuous = c("HD", "mice"), IDvars = NULL, ...)

Arguments

`data`	A data frame with missing values to be imputed using multiple hot deck imputation.
`ord`	A vector of ordinal variables to be used on the LHS of the ordered probit regression. Variables must not contain missing values
`evs`	A vector of explanatory variables to be used on the RHS of the ordered probit regression. Variables may contain missing values.
`m`	Number of imputed datasets required.
`method`	Method used to draw donors based on affinity either “best.cell” (the default) or “p.draw” for probabilistic draw.
`cutoff`	A numeric scalar such that any variable with fewer than `cutoff` unique non-missing values will be considered discrete and necessarily imputed with hot deck imputation.
`sdCutoff`	Number of standard deviations between observations such that observations fewer than `sdCutoff` standard deviations away from each other are considered sufficiently close to be a match, otherwise they are considered too far away to be a match.
`optimizeSD`	Logical indicating whether the `sdCutoff` parameter should be optimized such that the smallest possible value is chosen that produces no thin cells from which to draw donors. Thin cells are those where the number of donors is less than `m`.
`optimStep`	The size of the steps in the optimization if `optimizeSD` is `TRUE`.
`optimStop`	The value at which optimization should stop if it has not already found a value that produces no thin cells. If this value is reached and thin cells still exist, a warning will be returned, though the routine will continue using `optimStop` as `sdCutoff`.
`weightedAffinity`	Logical indicating whether a correlation-weighted affinity score should be used.
`impContinuous`	Character string indicating how continuous missing data should be imputed. Valid options are “HD” (the default) in which case hot-deck imputation will be used, or “mice” in which case multiple imputation by chained equations will be used.
`IDvars`	A character vector of variable names not to be used in the imputation, but to be included in the final imputed datasets.
`...`	Optional additional arguments to be passed down to the `mice` routine.

Value

The output is a list with the following elements:

dataAn object of class mi which contains m imputed datasets.
affinityA matrix of affinity scores see affinity.
donorsA list of donors for each missing observation based on the affinity score.
drawsThe m observations drawn from donors that were used for the multiple imputations.
max.emp.affNormalization constant for each row of affinity scores; the maximum possible value of the affinity scores if correlation-weighting is used.
max.the.affNormalization constant for each row of affinity scores; the number of columns in the original data.
data.origOriginal data fed into the function
data.orig.na.omitOriginal data without missing values
data.cutData after cutpoint replacements
plr.outResults polr
plr.dfResults of polr as a data frame
int.dfsA list of intercepts as data frames
ord.new.levNew ordinal variable levels
ord.new.lev.numNumeric version of new ordinal levels

Examples

data(ampData)
hd.ord(data = ampData,
      ord = c("Educ", "Interest"),
      evs = c("Dem", "Black", "Empl", "Male", "Inc", "Age"))

[Package hot.deck version 1.2 Index]