imp.outliers {PDtoolkit}R Documentation

Imputation methods for outliers

Description

imp.outliers replaces predefined quantum of the smallest and largest values by the less extreme values. This procedure is applicable only to the numeric risk factors.

Usage

imp.outliers(
  db,
  sc = c(NA, NaN, Inf, -Inf),
  method = "iqr",
  range = 1.5,
  upper.pct = 0.95,
  lower.pct = 0.05
)

Arguments

db

Data frame of risk factors supplied for imputation.

sc

Vector of all special case elements. Default values are c(NA, NaN, Inf). Those values will be excluded from calculation of imputed value and replacements.

method

Imputation method. Available options are: "iqr" and "percentile". Method iqr performs identification of outliers by the method applied in boxplot 5-figures, while for percentile method user defines lower and upper limits for replacement. Default value is "iqr".

range

Determines how far the plot whiskers extend out from the box. If range is positive, the whiskers extend to the most extreme data point which is no more than range times the interquartile range from the box. A value of zero causes the whiskers to extend to the data extremes. Default range is set to is 1.5.

upper.pct

Upper limit for percentile method. All values above this limit will be replaced by the value identified at this percentile. Default value is set to 95^{th} percentile (0.95). This parameter is used only if selected method is percentile.

lower.pct

Lower limit for percentile method. All values below this limit will be replaced by the value identified at this percentile. Default value is set to 5^{th} percentile (0.05). This parameter is used only if selected method is percentile.

Value

This function returns list of two data frames. The first data frame contains analyzed risk factors with imputed values for outliers, while the second data frame presents the imputation report. Using the imputation report, for each risk factor, user can inspect imputed info (info), imputation method (imputation.method), imputed value (imputation.val.upper and imputation.val.lower), number of imputed observations (imputation.num.upper and imputation.num.lower).

Examples

suppressMessages(library(PDtoolkit))
data(gcd)
gcd$age[1:20] <- NA
gcd$age.bin <- ndr.bin(x = gcd$age, y = gcd$qual, sc.method = "separately", y.type = "bina")[[2]]
gcd$dummy1 <- NA
imput.res.1 <- imp.outliers(db = gcd[, -1], 
		      method = "iqr",
		      range = 1.5)
#analyzed risk factors with imputed values
head(imput.res.1[[1]])
#imputation report
imput.res.1[[2]]
#percentile method
imput.res.2 <- imp.outliers(db = gcd[, -1], 
		      method = "percentile",
		      upper.pct = 0.95,
		      lower.pct = 0.05)
#analyzed risk factors with imputed values
head(imput.res.2[[1]])
#imputation report
imput.res.2[[2]]

[Package PDtoolkit version 1.2.0 Index]