performance_nan_imputation {Indicator}R Documentation

Function to evaluate nan imputation method's performance

Description

This function evaluates the performance of various missing value imputation methods in a quantitative dataframe. It is designed to examine and compare five different imputation methods using standard performance measures

Usage

performance_nan_imputation(data, to_impute, regressors, method = 1)

Arguments

data

A dataframe containing the observations (rows) and quantitative variables (columns) to be analyzed. This dataframe includes variables with missing values to be imputed

to_impute

A string specifying the name of the variable in the dataframe that contains the missing values to be imputed

regressors

A vector of strings indicating the names of the variables to be used as regressors for imputation in the case of methods 1 (lm_imputation) and 4 (hot deck imputation)

method

An integer between 1 and 5 that specifies the imputation method to be used. The supported methods are:

1: lm_imputation (Imputation by linear model)

2: median imputation (imputation by median)

3: mean imputation (imputation by mean)

4: hot deck imputation (imputation via hot deck)

5: EM imputation (imputation via Expectation-Maximization)

Details

This function is useful for comparing the effectiveness of different methods of imputing missing values, allowing the most appropriate method to be chosen based on measured performance

Value

The function returns a dataframe that contains a row for each imputation method and columns with performance measures. The performance measures included are:

R^2: Coefficient of Determination, which measures how well the imputed values fit the observed values

RMSE: Root Mean Squared Error, which provides a measure of the mean square deviation between imputed and observed values

MAE: Mean Absolute Error, which represents the mean absolute deviation between the imputed and observed values

References

OECD/European Union/EC-JRC (2008), Handbook on Constructing Composite Indicators: Methodology and User Guide, OECD Publishing, Paris, <https://doi.org/10.1787/9789264043466-en>

Examples


data("airquality")
regressors<-colnames(airquality[,c(3,4)])

#---Methods 1 = Imputation by linear model
performance_nan_imputation(data =airquality,"Ozone",regressors = regressors,method = 1)

#---Methods 2 = Imputation by Median
suppressWarnings(performance_nan_imputation(data =airquality,"Ozone",method = 2))

#---Methods 3 = Imputation by Mean
suppressWarnings(performance_nan_imputation(data =airquality,"Ozone",method = 3))

#---Methods 4 = Hot Deck imputation
performance_nan_imputation(data =airquality,"Ozone",regressors = regressors,method = 4)

#---Methods 5 = Expectation-Maximization imputation
performance_nan_imputation(data =airquality,"Ozone",regressors = regressors,method = 5)


[Package Indicator version 0.1.2 Index]