get_all_performance {Indicator} | R Documentation |
Function to evaluate different nan imputation methods
Description
The get_all_performance_boot function is designed to evaluate different methods of imputing missing values into a dataset
Usage
get_all_performance(data, to_impute, regressors)
Arguments
data |
dataframe with rows = observations and columns = quantitative variables |
to_impute |
string , name of the variables where there are NANs to impute |
regressors |
vector of string with names of the variables to use to apply 1st,4th imputation method |
Details
The function calculates performance metrics, such as:
- ,
-
and
-
for each imputation method
Supported Imputation Methods:
1. Linear Regression Imputation (lm_imputation): it uses a linear regression model to predict and impute missing values
2. Median Imputation (median_imputation): it replaces missing values with the median of observed values
3. Mean Imputation (mean_imputation): it replaces missing values with the mean of observed values
4. Hot Deck Imputation (hot_deck_imputation): it replaces missing values with similar observed values
5. Expectation-Maximization Imputation (EM_imputation): it uses the Expectation-Maximization algorithm to estimate and impute missing values
It evaluate different methods of imputing missing values and calculate performance metrics for each method
Value
It returns a performance measures dataframe with rows = methods and columns = methods' performances
Note
For the methods Median Imputation and Mean Imputation, it is not possible to calculate the R^2 value. This is because the standard deviation is zero based on the following R^2 formula:
where:
- is the number of imputations,
- are the observed data point,
- are the imputed data point,
- are the average of the observed data,
- are the average of the imputed data,
- are the standard deviation of the imputed data,
- are the standard deviation of the observed data
References
OECD/European Union/EC-JRC (2008), Handbook on Constructing Composite Indicators: Methodology and User Guide, OECD Publishing, Paris, <https://doi.org/10.1787/9789264043466-en>
Examples
data("airquality")
regressors<-colnames(airquality[,c(3,4)])
suppressWarnings(get_all_performance(data =airquality,"Ozone",regressors = regressors))