imp.rfnode.cond {RfEmpImp} | R Documentation |
Perform multiple imputation based on the conditional distribution formed by prediction nodes of random forests
Description
RfNode.Cond
multiple imputation method is for mixed types of variables,
using conditional distribution formed by predicting nodes of random forest
(out-of-bag observations will be excluded).
Usage
imp.rfnode.cond(
data,
num.imp = 5,
max.iter = 5,
num.trees = 10,
pre.boot = TRUE,
print.flag = FALSE,
...
)
Arguments
data |
A data frame or a matrix containing the incomplete data. Missing
values should be coded as |
num.imp |
Number of multiple imputations. The default is
|
max.iter |
Number of iterations. The default is |
num.trees |
Number of trees to build. The default is
|
pre.boot |
If |
print.flag |
If |
... |
Other arguments to pass down. |
Details
During imputation using imp.rfnode.cond
, for missing observations, the
candidate non-missing observations will be found by the predicting nodes
of random trees in the random forest model. Only the in-bag observations
for each random tree will be used for imputation.
Value
An object of S3 class mids
.
Author(s)
Shangzhi Hong
References
Hong, Shangzhi, et al. "Multiple imputation using chained random forests." Preprint, submitted April 30, 2020. https://arxiv.org/abs/2004.14823.
Zhang, Haozhe, et al. "Random Forest Prediction Intervals." The American Statistician (2019): 1-20.
Shah, Anoop D., et al. "Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study." American journal of epidemiology 179.6 (2014): 764-774.
Malley, James D., et al. "Probability machines." Methods of information in medicine 51.01 (2012): 74-81.
Examples
# Prepare data: convert categorical variables to factors
nhanes.fix <- nhanes
nhanes.fix[, c("age", "hyp")] <- lapply(nhanes[, c("age", "hyp")], as.factor)
# Perform imputation using imp.rfnode.cond
imp <- imp.rfnode.cond(nhanes.fix)
# Do repeated analyses
anl <- with(imp, lm(chl ~ bmi + hyp))
# Pool the results
pool <- pool(anl)
# Get pooled estimates
reg.ests(pool)