KRDetect.outliers.changepoint {envoutliers} | R Documentation |
Identification of outliers using changepoint analysis
Description
Identification of outliers in environmental data using method based on kernel smoothing, changepoint analysis of smoothing residuals and subsequent analysis of residuals on homogeneous segments (Campulova et al., 2018).
Usage
KRDetect.outliers.changepoint(x, perform.smoothing = TRUE,
perform.cp.analysis = TRUE, bandwidth.type = "local",
bandwidth.value = NULL, kernel.order = 2,
cp.analysis.type = "parametric", pen.value = "5*log(n)",
alpha.edivisive = 0.3, min.segment.length = 30,
segment.length.for.merge = 15, method = "auto",
prefer.grubbs = TRUE, alpha.default = NULL, L.default = NULL)
Arguments
x |
data values. Supported data types
|
perform.smoothing |
a logical value specifying if data smoothing is performed. If |
perform.cp.analysis |
a logical value specifying if changepoint analysis is performed. If |
bandwidth.type |
a character string specifying the type of bandwidth. Possible options are
|
bandwidth.value |
a local bandwidth array (for |
kernel.order |
a nonnegative integer giving the order of the optimal kernel (Gasser et al., 1985) used for smoothing. Possible options are
|
cp.analysis.type |
a character string specifying the type of changepoint analysis. Possible options are
|
pen.value |
a character string giving the formula for manual penalty used in PELT algorithm.
Only required for |
alpha.edivisive |
a numeric value giving the moment index used for determining the distance between and within segments in nonparametric changepoint model. Default is |
min.segment.length |
a numeric value giving minimal required number of observations on segments from changepoint analysis.
If a segment contains less than |
segment.length.for.merge |
a numeric value giving minimal required number of observations on segments for performing the homogeneity test within changepoint split control.
A segment with less data than |
method |
a character string specifying the method for identification of outlier residuals. Possible options are
|
prefer.grubbs |
a logical variable specyfing if Grubbs test for identification of outlier residuals is preferred to quantiles of normal distribution.
|
alpha.default |
a numeric value from interval (0,1) of alpha parameter determining the criterion for (residual) outlier detection:
the limits for outlier residuals on individual segments are set as |
L.default |
a numeric value of L parameter determining the criterion for outlier (residual) detection:
the limits for outlier residuals on individual segments are set as |
Details
This function identifies outliers in time series using procedure based on kernel smoothing, changepoint analysis of smoothing residuals and subsequent analysis of residuals on homogeneous segments (Campulova et al., 2018). Three different approaches (Grubbs test, quantiles of normal distribution, Chebyshev inequality), that can be selected automatically based on data structure or specified by the user, can be used to detect outlier residuals. Crucial for the method is the choice of parameters alpha and L for quantiles of normal distribution and Chebyshev inequality approach, that define the criterion for outlier detection. These values can be specified by the user or estimated automatically using data driven algorithms (Campulova et al., 2018).
Value
A "KRDetect"
object which contains a list with elements:
method.type |
a character string giving the type of method used for outlier idetification |
x |
a numeric vector of observations |
index |
a numeric vector of index design points assigned to individual observations |
smoothed |
a numeric vector of estimates of the kernel regression function (smoothed data) |
changepoints |
an integer membership vector for individual segments |
normality.results |
a data.frame of normality results of residuals on individual segments |
detection.method |
a character string giving the type of method used for identification of outlier residuals |
alpha |
a numeric vector of alpha parameters used for outlier identification on individual segments |
L |
a numeric vector of L parameters used for outlier identification on individual segments |
outlier |
a logical vector specyfing the identified outliers, |
References
Campulova M, Michalek J, Mikuska P, Bokal D (2018). Nonparametric algorithm for identification of outliers in environmental data. Journal of Chemometrics, 32, 453-463.
Gasser T, Kneip A, Kohler W (1991). A flexible and fast method for automatic smoothing. Journal of the American Statistical Association, 86, 643–652.
Herrmann E (1997). Local bandwidth choice in kernel regression estimation. Journal of Computational and Graphical Statistics, 6(1), 35–54.
Eva Herrmann; Packaged for R and enhanced by Martin Maechler (2016). lokern: Kernel Regression Smoothing with Local or Global Plug-in Bandwidth. R package version 1.1-8. https://CRAN.R-project.org/package=lokern.
Killick R, Fearnhead P, Eckley IA (2012). Optimal detection of changepoints with a linear computational cost. Journal of the American Statistical Association, 107(500), 1590–1598.
Killick R, Haynes K, Eckley IA (2016). changepoint: An R package for changepoint analysis. R package version 2.2.2, <URL: https://CRAN.R-project.org/package=changepoint>.
Matteson D, James N (2014). A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data. Journal of the American Statistical Association, 109(505), 334–345.
Nicholas A. James, David S. Matteson (2014). ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data. Journal of Statistical Software, 62(7), 1-25, URL "http://www.jstatsoft.org/v62/i07/".
Brys G, Hubert M, Struyf A (2008). Goodness-of-fit tests based on a robust measure of skewness. Computational Statistics, 23(3), 429–442.
Todorov V, Filzmoser P (2009). An Object-Oriented Framework for Robust Multivariate Analysis. Journal of Statistical Software, 32(3), 1-47. URL http://www.jstatsoft.org/v32/i03/.
Box G, Cox D (1964). An analysis of transformations. Journal of the Royal Statistical Society: Series B, 26, 211–234.
Venables WN, Ripley BD (2002). Modern Applied Statistics with S. New York, fourth edition. ISBN 0-387-95457-0, URL http://www.stats.ox.ac.uk/pub/MASS4.
Grubbs F (1950). Sample criteria for testing outlying observations. The Annals of Mathematical Statistics, 21(1), 27-58.
Fox J (2016). Applied regression analysis and generalized linear models. 3 edition. Los Angeles: SAGE. ISBN 9781452205663.
Examples
data("mydata", package = "openair")
x = mydata$o3[format(mydata$date, "%m %Y") == "12 2002"]
result = KRDetect.outliers.changepoint(x)
summary(result)
plot(result)
plot(result, show.segments = FALSE)