R: Missing data imputation with various methods

impute_data {missCompare}

R Documentation

Missing data imputation with various methods

Description

impute_data imputes a dataframe with missing values with selected algorithm(s)

Usage

impute_data(X, scale = TRUE, n.iter = 10, sel_method = c(1:16))

Arguments

X

Dataframe - the original data that contains missing values.

scale

Boolean with default TRUE. Scaling will scale and center all numeric variables to mean = 0 and standard deviation = 1. This is strongly suggested for all PCA-based methods, and for the sake of comparison (and in case all methods are run), for the other methods too. Please note, however, that some methods (e.g. pcaMethods NLPCA, missForest, etc.) are equipped to handle non-linear data. In these cases scaling is up to the user. Factor variables will not be scaled.

n.iter

Number of iterations to perform with default 10. This will only affect the probabilistic methods that allow for a multiple imputation framework. The rest of the methods (if specified to run) will only generate 1 imputed dataframe.

sel_method

Numeric vector that specifies which methods to run. Default is all methods (1-16), but any combinations, including selecting a single method, are allowed.

1	random replacement
2	median imputation
3	mean imputation
4	missMDA Regularized
5	missMDA EM
6	pcaMethods PPCA
7	pcaMethods svdImpute
8	pcaMethods BPCA
9	pcaMethods NIPALS
10	pcaMethods NLPCA
11	mice mixed
12	mi Bayesian
13	Amelia II
14	missForest
15	Hmisc aregImpute
16	VIM kNN

Details

This function assumes that the user has performed simulations using the impute_simulated function and arrived to some conclusions regarding which functions would be the best performing on their datasets. This function offers a convenient way to impute datasets with a curated list of functions. Some of the functions allow for a multiple imputation framework (they operate with probabilistic models, hence there is uncertainty in the imputed values), so this function allows to generate multiple imputed datasets. The user can decide to impute their dataframe with a selected method or with multiple methods.

Value

A nested list of imputed datasets. In case only a subset of methods was selected the non-selected list elements will be empty.

`random_replacement`	Imputed dataset using random replacement
`mean_imputation`	Imputed dataset using mean imputation
`median_imputation`	Imputed dataset using median imputation
`missMDA_reg_imputation`	Imputed dataset using the missMDA regularized imputation algorithm
`missMDA_EM_imputation`	Imputed dataset using the missMDA EM imputation algorithm
`pcaMethods_PPCA_imputation`	Imputed dataset using the pcaMethods PPCA imputation algorithm
`pcaMethods_svdImpute_imputation`	Imputed dataset using the pcaMethods svdImpute imputation algorithm
`pcaMethods_BPCA_imputation`	Imputed dataset using the pcaMethods BPCA imputation algorithm
`pcaMethods_Nipals_imputation`	Imputed dataset using the pcaMethods NIPALS imputation algorithm
`pcaMethods_NLPCA_imputation`	Imputed dataset using the pcaMethods NLPCA imputation algorithm
`mice_mixed_imputation`	Imputed dataset using the mice mixed imputation algorithm
`mi_Bayesian_imputation`	Imputed dataset using the mi Bayesian imputation algorithm
`ameliaII_imputation`	Imputed dataset using the Amelia2 imputation algorithm replacement
`missForest_imputation`	Imputed dataset using the missForest imputation algorithm replacement
`Hmisc_aregImpute_imputation`	Imputed dataset using the Hmisc aregImpute imputation algorithm
`VIM_kNN_imputation`	Imputed dataset using the VIM kNN imputation algorithm replacement

Examples

## running 10 iterations of all algorithms (that allow for multiple imputation) and
## one copy of those that do not allow for multiple imputations
# impute_data(df, scale = TRUE, n.iter = 10,
#            sel_method = c(1:16))
## running 20 iterations of missForest (e.g. this was the best performing algorithm
## in simulations) on a non-scaled dataframe
# impute_data(df, scale = FALSE, n.iter = 20,
#            sel_method = c(14))
## running 1 iterations of four selected non-probabilistic algorithms on a scaled dataframe
# impute_data(df, scale = TRUE, n.iter = 1,
#            sel_method = c(2:3, 5, 7))

[Package missCompare version 1.0.3 Index]