KRDetect.outliers.controlchart {envoutliers} | R Documentation |
Identification of outliers using control charts
Description
Identification of outliers in environmental data using two-step method based on kernel smoothing and control charts (Campulova et al., 2017). The outliers are identified as observations corresponding to segments of smoothing residuals exceeding control charts limits.
Usage
KRDetect.outliers.controlchart(x, perform.smoothing = TRUE,
bandwidth.type = "local", bandwidth.value = NULL, kernel.order = 2,
method = "range", group.size.x = 3, group.size.R = 3,
group.size.s = 3, L.x = 3, L.R = 3, L.s = 3)
Arguments
x |
data values. Supported data types
|
perform.smoothing |
a logical value specifying if data smoothing is performed. If |
bandwidth.type |
a character string specifying the type of bandwidth. Possible options are
|
bandwidth.value |
a local bandwidth array (for |
kernel.order |
a nonnegative integer giving the order of the optimal kernel (Gasser et al., 1985) used for smoothing. Possible options are
|
method |
a character string specifying the preferred estimate of standard deviation parameter. Possible options are
|
group.size.x |
a positive integer giving the number of observations in individual segments used for computation of x chart control limits.
If the data can not be equidistantly divided, the first extra values will be excluded from the analysis. Default is |
group.size.R |
a positive integer giving the number of observations in individual segments used for computation of R chart control limits.
If the data can not be equidistantly divided, the first extra values will be excluded from the analysis. Default is |
group.size.s |
a positive integer giving the number of observations in individual segments used for computation of s chart control limits.
If the data can not be equidistantly divided, the first extra values will be excluded from the analysis. Default is |
L.x |
a positive numeric value giving parameter |
L.R |
a positive numeric value giving parameter |
L.s |
a positive numeric value giving parameter |
Details
This function identifies outliers in environmental data using two-step procedure (Campulova et al., 2017).
The procedure consists of kernel smoothing and subsequent identification of observations corresponding to segments of smoothing residuals exceeding control charts limits.
This way the method does not identify individual outliers but segments of observations, where the outliers occur.
The output of the method are three logical vectors specyfing the outliers identified based on each of the three control charts.
Beside that logical vector specyfing the outliers identified based on at least one type of control limits is returned.
Crucial for the method is the choice of paramaters L.x
, L.R
and L.s
specifying the width of control limits.
Different values of the parameters determine different criteria for outlier detection. For more information see (Campulova et al., 2017).
Value
A "KRDetect"
object which contains a list with elements:
method.type |
a character string giving the type of method used for outlier idetification |
x |
a numeric vector of observations |
index |
a numeric vector of index design points assigned to individual observations |
smoothed |
a numeric vector of estimates of the kernel regression function (smoothed data) |
outlier.x |
a logical vector specyfing the identified outliers based on limits of control chart x, |
outlier.R |
a logical vector specyfing the identified outliers based on limits of control chart R, |
outlier.s |
a logical vector specyfing the identified outliers based on limits of control chart s, |
outlier |
a logical vector specyfing the identified outliers based on at least one type of control limits. |
LCL.x |
a numeric value giving lower control limit of control chart x |
UCL.x |
a numeric value giving upper control limit of control chart x |
LCL.s |
a numeric value giving lower control limit of control chart s |
UCL.s |
a numeric value giving upper control limit of control chart s |
LCL.R |
a numeric value giving lower control limit of control chart R |
UCL.R |
a numeric value giving upper control limit of control chart R |
References
Campulova M, Veselik P, Michalek J (2017). Control chart and Six sigma based algorithms for identification of outliers in experimental data, with an application to particulate matter PM10. Atmospheric Pollution Research. Doi=10.1016/j.apr.2017.01.004.
Shewhart W (1931). Quality control chart. Bell System Technical Journal, 5, 593–603.
SAS/QC User's Guide, Version 8, 1999. SAS Institute, Cary, N.C.
Wild C, Seber G (2000). Chance encounters: A first course in data analysis and inference. New York: John Wiley.
Joglekar, Anand M. Statistical methods for six sigma: in R&D and manufacturing. Hoboken, NJ: Wiley-Interscience. ISBN sbn0-471-20342-4.
Gasser T, Kneip A, Kohler W (1991). A flexible and fast method for automatic smoothing. Journal of the American Statistical Association, 86, 643–652.
Herrmann E (1997). Local bandwidth choice in kernel regression estimation. Journal of Computational and Graphical Statistics, 6(1), 35–54.
Eva Herrmann; Packaged for R and enhanced by Martin Maechler (2016). lokern: Kernel Regression Smoothing with Local or Global Plug-in Bandwidth. R package version 1.1-8. https://CRAN.R-project.org/package=lokern
Examples
data("mydata", package = "openair")
x = mydata$o3[format(mydata$date, "%m %Y") == "12 2002"]
result = KRDetect.outliers.controlchart(x)
summary(result)
plot(result)
plot(result, plot.type = "x")
plot(result, plot.type = "R")
plot(result, plot.type = "s")