imp.rfemp {RfEmpImp} | R Documentation |
Perform multiple imputation using the empirical error distributions and predicted probabilities of random forests
Description
RfEmp
multiple imputation method is for mixed types of variables,
and calls corresponding functions based on variable types.
Categorical variables should be of type factor
or logical
, etc.
RfPred.Emp
is used for continuous variables, and RfPred.Cate
is used for categorical variables.
Usage
imp.rfemp(
data,
num.imp = 5,
max.iter = 5,
num.trees = 10,
alpha.emp = 0,
sym.dist = TRUE,
pre.boot = TRUE,
num.trees.cont = NULL,
num.trees.cate = NULL,
num.threads = NULL,
print.flag = FALSE,
...
)
Arguments
data |
A data frame or a matrix containing the incomplete data. Missing
values should be coded as |
num.imp |
Number of multiple imputations. The default is
|
max.iter |
Number of iterations. The default is |
num.trees |
Number of trees to build. The default is
|
alpha.emp |
The "significance level" for the empirical distribution of
out-of-bag prediction errors, can be used for prevention for outliers
(helpful for highly skewed variables).
For example, set alpha = 0.05 to use 95% confidence level.
The default is |
sym.dist |
If |
pre.boot |
If |
num.trees.cont |
Number of trees to build for continuous variables.
The default is |
num.trees.cate |
Number of trees to build for categorical variables,
The default is |
num.threads |
Number of threads for parallel computing. The default is
|
print.flag |
If |
... |
Other arguments to pass down. |
Details
For continuous variables, mice.impute.rfpred.emp
is called, performing
imputation based on the empirical distribution of out-of-bag
prediction errors of random forests.
For categorical variables, mice.impute.rfpred.cate
is called,
performing imputation based on predicted probabilities.
Value
An object of S3 class mids
.
Author(s)
Shangzhi Hong
References
Hong, Shangzhi, et al. "Multiple imputation using chained random forests." Preprint, submitted April 30, 2020. https://arxiv.org/abs/2004.14823.
Zhang, Haozhe, et al. "Random Forest Prediction Intervals." The American Statistician (2019): 1-20.
Shah, Anoop D., et al. "Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study." American journal of epidemiology 179.6 (2014): 764-774.
Malley, James D., et al. "Probability machines." Methods of information in medicine 51.01 (2012): 74-81.
Examples
# Prepare data: convert categorical variables to factors
nhanes.fix <- nhanes
nhanes.fix[, c("age", "hyp")] <- lapply(nhanes[, c("age", "hyp")], as.factor)
# Perform imputation using imp.rfemp
imp <- imp.rfemp(nhanes.fix)
# Do repeated analyses
anl <- with(imp, lm(chl ~ bmi + hyp))
# Pool the results
pool <- pool(anl)
# Get pooled estimates
reg.ests(pool)