impute_data {missCompare} | R Documentation |
Missing data imputation with various methods
Description
impute_data
imputes a dataframe with missing values with selected algorithm(s)
Usage
impute_data(X, scale = TRUE, n.iter = 10, sel_method = c(1:16))
Arguments
X |
Dataframe - the original data that contains missing values. | |||||||||||||||||||||||||||||||||
scale |
Boolean with default TRUE. Scaling will scale and center all numeric variables to mean = 0 and standard deviation = 1. This is strongly suggested for all PCA-based methods, and for the sake of comparison (and in case all methods are run), for the other methods too. Please note, however, that some methods (e.g. pcaMethods NLPCA, missForest, etc.) are equipped to handle non-linear data. In these cases scaling is up to the user. Factor variables will not be scaled. | |||||||||||||||||||||||||||||||||
n.iter |
Number of iterations to perform with default 10. This will only affect the probabilistic methods that allow for a multiple imputation framework. The rest of the methods (if specified to run) will only generate 1 imputed dataframe. | |||||||||||||||||||||||||||||||||
sel_method |
Numeric vector that specifies which methods to run. Default is all methods (1-16), but any combinations, including selecting a single method, are allowed.
|
Details
This function assumes that the user has performed simulations using the impute_simulated
function and arrived to
some conclusions regarding which functions would be the best performing on their datasets. This function offers a convenient
way to impute datasets with a curated list of functions. Some of the functions allow for a multiple imputation framework
(they operate with probabilistic models, hence there is uncertainty in the imputed values), so this function allows to generate
multiple imputed datasets. The user can decide to impute their dataframe with a selected method or with multiple methods.
Value
A nested list of imputed datasets. In case only a subset of methods was selected the non-selected list elements will be empty.
random_replacement |
Imputed dataset using random replacement |
mean_imputation |
Imputed dataset using mean imputation |
median_imputation |
Imputed dataset using median imputation |
missMDA_reg_imputation |
Imputed dataset using the missMDA regularized imputation algorithm |
missMDA_EM_imputation |
Imputed dataset using the missMDA EM imputation algorithm |
pcaMethods_PPCA_imputation |
Imputed dataset using the pcaMethods PPCA imputation algorithm |
pcaMethods_svdImpute_imputation |
Imputed dataset using the pcaMethods svdImpute imputation algorithm |
pcaMethods_BPCA_imputation |
Imputed dataset using the pcaMethods BPCA imputation algorithm |
pcaMethods_Nipals_imputation |
Imputed dataset using the pcaMethods NIPALS imputation algorithm |
pcaMethods_NLPCA_imputation |
Imputed dataset using the pcaMethods NLPCA imputation algorithm |
mice_mixed_imputation |
Imputed dataset using the mice mixed imputation algorithm |
mi_Bayesian_imputation |
Imputed dataset using the mi Bayesian imputation algorithm |
ameliaII_imputation |
Imputed dataset using the Amelia2 imputation algorithm replacement |
missForest_imputation |
Imputed dataset using the missForest imputation algorithm replacement |
Hmisc_aregImpute_imputation |
Imputed dataset using the Hmisc aregImpute imputation algorithm |
VIM_kNN_imputation |
Imputed dataset using the VIM kNN imputation algorithm replacement |
Examples
## running 10 iterations of all algorithms (that allow for multiple imputation) and
## one copy of those that do not allow for multiple imputations
# impute_data(df, scale = TRUE, n.iter = 10,
# sel_method = c(1:16))
## running 20 iterations of missForest (e.g. this was the best performing algorithm
## in simulations) on a non-scaled dataframe
# impute_data(df, scale = FALSE, n.iter = 20,
# sel_method = c(14))
## running 1 iterations of four selected non-probabilistic algorithms on a scaled dataframe
# impute_data(df, scale = TRUE, n.iter = 1,
# sel_method = c(2:3, 5, 7))