miceRanger {miceRanger} | R Documentation |
miceRanger: Fast Imputation with Random Forests
Description
Performs multiple imputation by chained random forests. Returns a miceDefs object, which contains information about the imputation process.
Usage
miceRanger(
data,
m = 5,
maxiter = 5,
vars,
valueSelector = c("meanMatch", "value"),
meanMatchCandidates = pmax(round(nrow(data) * 0.01), 5),
returnModels = FALSE,
parallel = FALSE,
verbose = TRUE,
...
)
Arguments
data |
A data.frame or data.table to be imputed. |
m |
The number of datasets to produce. |
maxiter |
The number of iterations to run for each dataset. |
vars |
Specifies which and how variables should be imputed. Can be specified in 3 different ways:
|
valueSelector |
How to select the value to be imputed from the model predictions.
Can be "meanMatching", "value", or a named vector containing a mixture of those values.
If a named vector is passed, the names must equal the variables to be imputed specified in |
meanMatchCandidates |
Specifies the number of candidate values which are selected from in the
mean matching algorithm. Can be either specified as an integer or a named integer vector for different
values by variable. If a named integer vector is passed, the names of the vector must contain at a
minimum the names of the numeric variables imputed using |
returnModels |
Logical. Should the final model for each variable be returned? Set to |
parallel |
Should the process run in parallel? Usually not necessary. This process will
take advantage of any cluster set up when |
verbose |
should progress be printed? |
... |
other parameters passed to |
Value
a miceDefs object, containing the following:
callParams |
The parameters of the object. |
data |
The original data provided by the user, cast to a data.table. |
naWhere |
Logical index of missing data, having the same dimensions as |
missingCounts |
The number of missing values for each variable |
rawClasses |
The original classes provided in |
newClasses |
The new classes of the returned data. |
allImps |
The imputations of all variables at each iteration, for each dataset. |
allImport |
The variable importance metrics at each iteration, for each dataset. |
allError |
The OOB model error for all variables at each iteration, for each dataset. |
finalImps |
The final imputations for each dataset. |
finalImport |
The final variable importance metrics for each dataset. |
finalError |
The final model error for each variable in every dataset. |
finalModels |
Only returned if |
imputationTime |
The total time in seconds taken to create the imputations for the specified datasets and iterations. Does not include any setup time. |
Vignettes
It is highly recommended to visit the GitHub README for a thorough walkthrough of miceRanger's capabilities, as well as performance benchmarks.
Several vignettes are also available on miceRanger's listing on the CRAN website.
Examples
#################
## Simple Example
data(iris)
ampIris <- amputeData(iris)
miceObj <- miceRanger(
ampIris
, m = 1
, maxiter = 1
, verbose=FALSE
, num.threads = 1
, num.trees=5
)
##################
## Run in parallel
data(iris)
ampIris <- amputeData(iris)
library(doParallel)
cl <- makeCluster(2)
registerDoParallel(cl)
# Perform mice
miceObjPar <- miceRanger(
ampIris
, m = 2
, maxiter = 2
, parallel = TRUE
, verbose = FALSE
)
stopCluster(cl)
registerDoSEQ()
############################
## Complex Imputation Schema
data(iris)
ampIris <- amputeData(iris)
# Define variables to impute, as well as their predictors
v <- list(
Sepal.Width = c("Sepal.Length","Petal.Width","Species")
, Sepal.Length = c("Sepal.Width","Petal.Width")
, Species = c("Sepal.Width")
)
# Specify mean matching for certain variables.
vs <- c(
Sepal.Width = "meanMatch"
, Sepal.Length = "value"
, Species = "meanMatch"
)
# Different mean matching candidates per variable.
mmc <- c(
Sepal.Width = 4
, Species = 10
)
miceObjCustom <- miceRanger(
ampIris
, m = 1
, maxiter = 1
, vars = v
, valueSelector = vs
, meanMatchCandidates = mmc
, verbose=FALSE
)