BACON {robustX} | R Documentation |
BACON for Regression or Multivariate Covariance Estimation
Description
BACON, short for ‘Blocked Adaptive Computationally-Efficient Outlier Nominators’, is a somewhat robust algorithm (set), with an implementation for regression or multivariate covariance estimation.
BACON()
applies the multivariate (covariance estimation)
algorithm, using mvBACON(x)
in any case, and when
y
is not NULL
adds a regression iteration phase,
using the auxiliary .lmBACON()
function.
Usage
BACON(x, y = NULL, intercept = TRUE,
m = min(collect * p, n * 0.5),
init.sel = c("Mahalanobis", "dUniMedian", "random", "manual", "V2"),
man.sel, init.fraction = 0, collect = 4,
alpha = 0.05, alphaLM = alpha, maxsteps = 100, verbose = TRUE)
## *Auxiliary* function:
.lmBACON(x, y, intercept = TRUE,
init.dis, init.fraction = 0, collect = 4,
alpha = 0.05, maxsteps = 100, verbose = TRUE)
Arguments
x |
a multivariate matrix of dimension [n x p] considered as containing no missing values. |
y |
the response (n vector) in the case of regression, or
|
intercept |
logical indicating if an intercept has to be used for the regression. |
m |
integer in |
init.sel |
character string, specifying the initial selection
mode; see |
man.sel |
only when |
init.dis |
the distances of the x matrix used for the initial
subset determined by |
init.fraction |
if this parameter is > 0 then the tedious steps of selecting the initial subset are skipped and an initial subset of size n * init.fraction is chosen (with smallest dis) |
collect |
numeric factor chosen by the user to define the size of the initial subset (p * collect) |
alpha |
number in |
alphaLM |
number in |
maxsteps |
the maximal number of iteration steps (to prevent infinite loops) |
verbose |
logical indicating if messages are printed which trace progress of the algorithm. |
Details
Notably about the initial selection mode, init.sel
, see its
description in the mvBACON
arguments list.
The choice of alpha
and alphaLM
:
Multivariate outlier nomination: see the Details section of
mvBACON
.Regression: Let
t_r(\alpha)
denote the1-\alpha
quantile of the Studentt
-distribution withr
degrees of freedom, wherer
is the number of elements in the current subset; e.g.,t_r(0.05)
is the 0.95 quantile. Following Billor et al. (2000), the cutoff value for the discrepancies is defined ast_r(\alpha/(2r + 2))
, and they use\alpha=0.05
. Note that this is argumentalphaLM
(defualting toalpha
) forBACON()
.
Value
BACON(x,y,..)
(for regression) returns a list
with
components
subset |
the observation indices (in |
tis |
the |
mv.dis |
the (final) discrepancies or distances of
|
mv.subset |
the “good” subset from |
Note
“BACON” was also chosen in honor of Francis Bacon:
Whoever knows the ways of Nature will more easily notice her deviations;
and, on the other hand, whoever knows her deviations will more accurately
describe her ways.
Francis Bacon (1620), Novum Organum II 29.
Author(s)
Ueli Oetliker, Swiss Federal Statistical Office, for S-plus 5.1; 25.05.2001; modified six times till 17.6.2001.
Port to R, testing etc, by Martin Maechler.
Daniel Weeks (at pitt.edu) proposed a fix to a long standing buglet in
GiveTis()
computing the t_i
, which was further improved
Maechler, for robustX version 1.2-3 (Feb. 2019).
Correction of alpha
default, from 0.95 to 0.05, by Tobias Schoch,
see mvBACON
.
References
Billor, N., Hadi, A. S., and Velleman , P. F. (2000). BACON: Blocked Adaptive Computationally-Efficient Outlier Nominators; Computational Statistics and Data Analysis 34, 279–298. doi:10.1016/S0167-9473(99)00101-2
See Also
mvBACON
, the multivariate version of the BACON
algorithm.
Examples
data(starsCYG, package = "robustbase")
## Plot simple data and fitted lines
plot(starsCYG)
lmST <- lm(log.light ~ log.Te, data = starsCYG)
abline(lmST, col = "gray") # least squares line
str(B.ST <- with(starsCYG, BACON(x = log.Te, y = log.light)))
## 'subset': A good set of of points (to determine regression):
colB <- adjustcolor(2, 1/2)
points(log.light ~ log.Te, data = starsCYG, subset = B.ST$subset,
pch = 19, cex = 1.5, col = colB)
## A BACON-derived line:
lmB <- lm(log.light ~ log.Te, data = starsCYG, subset = B.ST$subset)
abline(lmB, col = colB, lwd = 2)
require(robustbase)
(RlmST <- lmrob(log.light ~ log.Te, data = starsCYG))
abline(RlmST, col = "blue")