KRDetect.outliers.controlchart {envoutliers}  R Documentation 
Identification of outliers in environmental data using twostep method based on kernel smoothing and control charts (Campulova et al., 2017). The outliers are identified as observations corresponding to segments of smoothing residuals exceeding control charts limits.
KRDetect.outliers.controlchart(x, perform.smoothing = TRUE, bandwidth.type = "local", bandwidth.value = NULL, kernel.order = 2, method = "range", group.size.x = 3, group.size.R = 3, group.size.s = 3, L.x = 3, L.R = 3, L.s = 3)
x 
data values. Supported data types

perform.smoothing 
a logical value specifying if data smoothing is performed. If 
bandwidth.type 
a character string specifying the type of bandwidth. Possible options are

bandwidth.value 
a local bandwidth array (for 
kernel.order 
a nonnegative integer giving the order of the optimal kernel (Gasser et al., 1985) used for smoothing. Possible options are

method 
a character string specifying the preferred estimate of standard deviation parameter. Possible options are

group.size.x 
a positive integer giving the number of observations in individual segments used for computation of x chart control limits.
If the data can not be equidistantly divided, the first extra values will be excluded from the analysis. Default is 
group.size.R 
a positive integer giving the number of observations in individual segments used for computation of R chart control limits.
If the data can not be equidistantly divided, the first extra values will be excluded from the analysis. Default is 
group.size.s 
a positive integer giving the number of observations in individual segments used for computation of s chart control limits.
If the data can not be equidistantly divided, the first extra values will be excluded from the analysis. Default is 
L.x 
a positive numeric value giving parameter 
L.R 
a positive numeric value giving parameter 
L.s 
a positive numeric value giving parameter 
This function identifies outliers in environmental data using twostep procedure (Campulova et al., 2017).
The procedure consists of kernel smoothing and subsequent identification of observations corresponding to segments of smoothing residuals exceeding control charts limits.
This way the method does not identify individual outliers but segments of observations, where the outliers occur.
The output of the method are three logical vectors specyfing the outliers identified based on each of the three control charts.
Beside that logical vector specyfing the outliers identified based on at least one type of control limits is returned.
Crucial for the method is the choice of paramaters L.x
, L.R
and L.s
specifying the width of control limits.
Different values of the parameters determine different criteria for outlier detection. For more information see (Campulova et al., 2017).
A "KRDetect"
object which contains a list with elements:
method.type 
a character string giving the type of method used for outlier idetification 
x 
a numeric vector of observations 
index 
a numeric vector of index design points assigned to individual observations 
smoothed 
a numeric vector of estimates of the kernel regression function (smoothed data) 
outlier.x 
a logical vector specyfing the identified outliers based on limits of control chart x, 
outlier.R 
a logical vector specyfing the identified outliers based on limits of control chart R, 
outlier.s 
a logical vector specyfing the identified outliers based on limits of control chart s, 
outlier 
a logical vector specyfing the identified outliers based on at least one type of control limits. 
LCL.x 
a numeric value giving lower control limit of control chart x 
UCL.x 
a numeric value giving upper control limit of control chart x 
LCL.s 
a numeric value giving lower control limit of control chart s 
UCL.s 
a numeric value giving upper control limit of control chart s 
LCL.R 
a numeric value giving lower control limit of control chart R 
UCL.R 
a numeric value giving upper control limit of control chart R 
Campulova M, Veselik P, Michalek J (2017). Control chart and Six sigma based algorithms for identification of outliers in experimental data, with an application to particulate matter PM10. Atmospheric Pollution Research. Doi=10.1016/j.apr.2017.01.004.
Shewhart W (1931). Quality control chart. Bell System Technical Journal, 5, 593–603.
SAS/QC User's Guide, Version 8, 1999. SAS Institute, Cary, N.C.
Wild C, Seber G (2000). Chance encounters: A first course in data analysis and inference. New York: John Wiley.
Joglekar, Anand M. Statistical methods for six sigma: in R&D and manufacturing. Hoboken, NJ: WileyInterscience. ISBN sbn0471203424.
Gasser T, Kneip A, Kohler W (1991). A flexible and fast method for automatic smoothing. Journal of the American Statistical Association, 86, 643–652.
Herrmann E (1997). Local bandwidth choice in kernel regression estimation. Journal of Computational and Graphical Statistics, 6(1), 35–54.
Eva Herrmann; Packaged for R and enhanced by Martin Maechler (2016). lokern: Kernel Regression Smoothing with Local or Global Plugin Bandwidth. R package version 1.18. https://CRAN.Rproject.org/package=lokern
data("mydata", package = "openair") x = mydata$o3[format(mydata$date, "%m %Y") == "12 2002"] result = KRDetect.outliers.controlchart(x) summary(result) plot(result) plot(result, plot.type = "x") plot(result, plot.type = "R") plot(result, plot.type = "s")