imput_cov {OTrecod} | R Documentation |
imput_cov()
Description
This function performs imputations on incomplete covariates, whatever their types, using functions from the package MICE (Van Buuren's Multiple Imputation) or functions from the package missMDA (Simple Imputation with Multivariate data analysis).
Usage
imput_cov(
dat1,
indcol = 1:ncol(dat1),
R_mice = 5,
meth = rep("pmm", ncol(dat1)),
missMDA = FALSE,
NB_COMP = 3,
seed_choice = sample(1:1e+06, 1)
)
Arguments
dat1 |
a data.frame containing the variables to be imputed and those involved in the imputations |
indcol |
a vector of integers. The corresponding column indexes (or numbers) corresponding to the variables to be imputed and those involved in the imputations. |
R_mice |
an integer. The number of imputed database generated with MICE method (5 by default). |
meth |
a vector of characters which specifies the imputation method to be used for each column in |
missMDA |
a boolean. If |
NB_COMP |
an integer corresponding to the number of components used in FAMD to predict the missing entries (3 by default) when the |
seed_choice |
an integer used as argument by the set.seed() for offsetting the random number generator (Random integer by default) |
Details
By default, the function impute_cov
handles missing information using multivariate imputation by chained equations (MICE, see (1) for more details about the method) by integrating in its syntax the function mice
.
All values of this last function are taken by default, excepted the required number of multiple imputations, which can be fixed by using the argument R_mice
, and the chosen imputation method for each variable (meth
argument),
that corresponds to the argument defaultMethod
of the function mice
.
When multiple imputations are required (for MICE only), each missing information is imputed by a consensus value:
the average of the candidate values will be retained for numerical variables, while the most frequent class will be remained for categorical variables (ordinal or not).
The output MICE_IMPS
stores the imputed databases to allow users to build their own consensus values by themselves and(or) to eventually assess the variabilities related to the proposed imputed values if necessary.
For this method, a random number generator must be fixed or sampled using the argument seed_choice
.
When the argument missMDA
is equalled to TRUE
, incomplete values are replaced (single imputation) using a method based on dimensionality reduction called factor analysis for mixed data (FAMD) using the the imputeFAMD
function of the missMDA package (2).
Using this approach, the function imput_cov
keeps all the default values integrated in the function imputeFAMD
excepted the number of dimensions used for FAMD which can be fixed by users (3 by default).
Value
A list of 3 or 4 objects (depending on the missMDA argument). The first three following objects if missMDA
= TRUE, otherwise 4 objects are returned:
RAW |
a data.frame corresponding to the raw database |
IMPUTE |
a character indicating the type of selected imputation |
DATA_IMPUTE |
a data.frame corresponding to the completed (consensus if multiple imputations) database |
MICE_IMPS |
only if missMDA = FALSE. A list object containing the R imputed databases generated by MICE |
Author(s)
Gregory Guernec
References
van Buuren, S., Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1–67. urlhttps://www.jstatsoft.org/v45/i03/
Josse J, Husson F (2016). missMDA: A Package for Handling Missing Values in Multivariate Data Analysis. Journal of Statistical Software, 70(1), 1–31. doi: 10.18637/jss.v070.i01
Examples
# Imputation of all incomplete covariates in the table simu_data:
data(simu_data)
# Here we keep the complete variable "Gender" in the imputation model.
# Using MICE (REP = 3):
imput_mice <- imput_cov(simu_data,
indcol = 4:8, R_mice = 3,
meth = c("logreg", "polyreg", "polr", "logreg", "pmm")
)
summary(imput_mice)
# Using FAMD (NB_COMP = 3):
imput_famd <- imput_cov(simu_data,
indcol = 4:8,
meth = c("logreg", "polyreg", "polr", "logreg", "pmm"),
missMDA = TRUE
)
summary(imput_famd)