wBACON {wbacon} | R Documentation |
Weighted BACON Algorithm for Multivariate Outlier Detection
Description
wBACON
is an iterative method for the computation of multivariate
location and scatter (under the assumption of a Gaussian distribution).
Usage
wBACON(x, weights = NULL, alpha = 0.05, collect = 4, version = c("V2", "V1"),
na.rm = FALSE, maxiter = 50, verbose = FALSE, n_threads = 2)
distance(x)
## S3 method for class 'wbaconmv'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'wbaconmv'
summary(object, ...)
center(object)
## S3 method for class 'wbaconmv'
vcov(object, ...)
Arguments
x |
|
weights |
|
alpha |
|
collect |
determines the size |
version |
|
na.rm |
|
maxiter |
|
verbose |
|
n_threads |
|
digits |
|
... |
additional arguments passed to the method. |
object |
object of class |
Details
The algorithm is initialized from a set of uncontaminated data. Then the subset is iteratively refined; i.e., additional observations are included into the subset if their Mahalanobis distance is below some threshold (likewise, observations are removed from the subset if their distance larger than the threshold). This process iterates until the set of good data remain stable. Observations not among the good data are outliers; see Billor et al. (2000). The weighted Bacon algorithm is due to Béguin and Hulliger (2008).
The threshold for the (squared) Mahalanobis distances is defined as
the standardized chi-square 1 - \alpha
quantile. All
observations whose squared Mahalanobis distances is larger than
the threshold are regarded as outliers.
If the sampling weights weights
are not explicitly specified (i.e.,
weights = NULL
), they are taken to be 1.0.
Incomplete/missing data
The wBACON
cannot deal with missing values. In contrast,
function BEM
in package modi implements
the BACON-EEM algorithm of Béguin and Hulliger (2008), which
is tailored to work with outlying and missing values.
If the argument na.rm
is set to TRUE
the method behaves
like na.omit
.
Assumptions
The BACON algorithm assumes that the non-outlying data have (roughly) an elliptically contoured distribution (this includes the Gaussian distribution as a special case). "Although the algorithms will often do something reasonable even when these assumptions are violated, it is hard to say what the results mean." (Billor et al., 2000, p. 289)
In line with Billor et al. (2000, p. 290), we use the term outlier "nomination" rather than "detection" to highlight that algorithms should not go beyond nominating observations as potential outliers; see also Béguin and Hulliger (2008). It is left to the analyst to finally label outlying observations as such.
Utility functions and tools
Diagnostic plots are available by the plot
method.
The method center
and vcov
return, respectively, the
estimated center/location and covariance matrix.
The distance
method returns the robust Mahalanobis distances.
The function is_outlier returns a vector of logicals that flags the nominated outliers.
Value
An object of class wbaconmv
with slots
x |
see function arguments |
weights |
see function arguments |
center |
estimated center of the data |
dist |
Mahalanobis distances |
n |
number of observations |
p |
number of variables |
alpha |
see function arguments |
subset |
final subset of outlier-free data |
cutoff |
see function arguments |
maxiter |
number of iterations until convergence |
version |
see functions arguments |
collect |
see functions arguments |
cov |
covariance matrix |
converged |
logical that indicates whether the algorithm converged |
call |
the matched call |
References
Billor N., Hadi A.S. and Vellemann P.F. (2000). BACON: Blocked Adaptive Computationally efficient Outlier Nominators. Computational Statistics and Data Analysis 34, pp. 279–298. doi:10.1016/S0167-9473(99)00101-2
Béguin C. and Hulliger B. (2008). The BACON-EEM Algorithm for Multivariate Outlier Detection in Incomplete Survey Data. Survey Methodology 34, pp. 91–103. https://www150.statcan.gc.ca/n1/en/catalogue/12-001-X200800110616
Schoch, T. (2021). wbacon: Weighted BACON algorithms for multivariate outlier nomination (detection) and robust linear regression, Journal of Open Source Software 6 (62), 3238 doi:10.21105/joss.03238
See Also
plot
and
is_outlier
Examples
data(swiss)
dt <- swiss[, c("Fertility", "Agriculture", "Examination", "Education",
"Infant.Mortality")]
m <- wBACON(dt)
m
which(is_outlier(m))