imputate_outlier {dlookr} | R Documentation |
Impute Outliers
Description
Outliers are imputed with some representative values and statistical methods.
Usage
imputate_outlier(.data, xvar, method, no_attrs, cap_ntiles)
Arguments
.data |
a data.frame or a |
xvar |
variable name to replace missing value. |
method |
method of missing values imputation. |
no_attrs |
logical. If TRUE, return numerical variable or categorical variable. else If FALSE, imputation class. |
cap_ntiles |
numeric. Only used when method is "capping". Specifies the value of percentiles replaced by the values of lower outliers and upper outliers. The default is c(0.05, 0.95). |
Details
imputate_outlier() creates an imputation class. The 'imputation' class includes missing value position, imputed value, and method of missing value imputation, etc. The 'imputation' class compares the imputed value with the original value to help determine whether the imputed value is used in the analysis.
See vignette("transformation") for an introduction to these concepts.
Value
An object of imputation class. or numerical variable. if no_attrs is FALSE then return imputation class, else no_attrs is TRUE then return numerical vector. Attributes of imputation class is as follows.
method : method of missing value imputation.
predictor is numerical variable
"mean" : arithmetic mean
"median" : median
"mode" : mode
"capping" : Impute the upper outliers with 95 percentile, and Impute the lower outliers with 5 percentile.
You can change this criterion with the cap_ntiles argument.
outlier_pos : position of outliers in predictor.
outliers : outliers. outliers corresponding to outlier_pos.
type : "outliers". type of imputation.
See Also
Examples
# Replace the outliers of the sodium variable with median.
imputate_outlier(heartfailure, sodium, method = "median")
# Replace the outliers of the sodium variable with capping.
imputate_outlier(heartfailure, sodium, method = "capping")
imputate_outlier(heartfailure, sodium, method = "capping",
cap_ntiles = c(0.1, 0.9))
## using dplyr -------------------------------------
library(dplyr)
# The mean before and after the imputation of the sodium variable
heartfailure %>%
mutate(sodium_imp = imputate_outlier(heartfailure, sodium,
method = "capping", no_attrs = TRUE)) %>%
group_by(death_event) %>%
summarise(orig = mean(sodium, na.rm = TRUE),
imputation = mean(sodium_imp, na.rm = TRUE))
# If the variable of interest is a numerical variables
sodium <- imputate_outlier(heartfailure, sodium)
sodium
summary(sodium)
plot(sodium)