imp.outliers {PDtoolkit} | R Documentation |
Imputation methods for outliers
Description
imp.outliers
replaces predefined quantum of the smallest and largest values by the less
extreme values. This procedure is applicable only to the numeric risk factors.
Usage
imp.outliers(
db,
sc = c(NA, NaN, Inf, -Inf),
method = "iqr",
range = 1.5,
upper.pct = 0.95,
lower.pct = 0.05
)
Arguments
db |
Data frame of risk factors supplied for imputation. |
sc |
Vector of all special case elements. Default values are |
method |
Imputation method. Available options are: |
range |
Determines how far the plot whiskers extend out from the box. If range is positive,
the whiskers extend to the most extreme data point which is no more than range times the
interquartile range from the box. A value of zero causes the whiskers to extend to
the data extremes. Default |
upper.pct |
Upper limit for percentile method. All values above this limit will be replaced by the value
identified at this percentile. Default value is set to |
lower.pct |
Lower limit for percentile method. All values below this limit will be replaced by the value
identified at this percentile. Default value is set to |
Value
This function returns list of two data frames. The first data frame contains analyzed risk factors with
imputed values for outliers, while the second data frame presents the imputation report. Using the imputation report,
for each risk factor, user can inspect imputed info (info
), imputation method (imputation.method
),
imputed value (imputation.val.upper
and imputation.val.lower
),
number of imputed observations (imputation.num.upper
and imputation.num.lower
).
Examples
suppressMessages(library(PDtoolkit))
data(gcd)
gcd$age[1:20] <- NA
gcd$age.bin <- ndr.bin(x = gcd$age, y = gcd$qual, sc.method = "separately", y.type = "bina")[[2]]
gcd$dummy1 <- NA
imput.res.1 <- imp.outliers(db = gcd[, -1],
method = "iqr",
range = 1.5)
#analyzed risk factors with imputed values
head(imput.res.1[[1]])
#imputation report
imput.res.1[[2]]
#percentile method
imput.res.2 <- imp.outliers(db = gcd[, -1],
method = "percentile",
upper.pct = 0.95,
lower.pct = 0.05)
#analyzed risk factors with imputed values
head(imput.res.2[[1]])
#imputation report
imput.res.2[[2]]