mvBACON {robustX} | R Documentation |
BACON: Blocked Adaptive Computationally-Efficient Outlier Nominators
Description
This function performs an outlier identification algorithm to the data in the x array [n x p] and y vector [n] following the lines described by Hadi et al. for their BACON outlier procedure.
Usage
mvBACON(x, collect = 4, m = min(collect * p, n * 0.5), alpha = 0.05,
init.sel = c("Mahalanobis", "dUniMedian", "random", "manual", "V2"),
man.sel, maxsteps = 100, allowSingular = FALSE, verbose = TRUE)
Arguments
x |
numeric matrix (of dimension |
collect |
a multiplication factor |
m |
integer in |
alpha |
determines the cutoff value for the Mahalanobis distances (see details). |
init.sel |
character string, specifying the initial selection mode; implemented modes are:
|
man.sel |
only when |
maxsteps |
maximal number of iteration steps. |
allowSingular |
logical indicating a solution should be sought
also when no matrix of rank |
verbose |
logical indicating if messages are printed which trace progress of the algorithm. |
Details
Remarks on the tuning parameter alpha
: Let \chi^2_p
be a chi-square distributed random variable with p
degrees
of freedom (p
is the number of variables; n
is the
number of observations). Denote the (1-\alpha)
quantile by
\chi^2_p(\alpha)
, e.g., \chi^2_p(0.05)
is the 0.95 quantile.
Following Billor et al. (2000), the cutoff value for the
Mahalanobis distances is defined as \chi_p(\alpha/n)
(the square
root of chi^2_p
) times a correction factor c(n,p)
,
n
and p
,
and they use \alpha=0.05
.
Value
a list
with components
subset |
logical vector of length |
dis |
numeric vector of length |
cov |
|
Author(s)
Ueli Oetliker, Swiss Federal Statistical Office, for S-plus 5.1.
Port to R, testing etc, by Martin Maechler;
Init selection "V2"
and correction of default alpha
from 0.95 to 0.05,
by Tobias Schoch, FHNW Olten, Switzerland.
References
Billor, N., Hadi, A. S., and Velleman , P. F. (2000). BACON: Blocked Adaptive Computationally-Efficient Outlier Nominators; Computational Statistics and Data Analysis 34, 279–298. doi:10.1016/S0167-9473(99)00101-2
See Also
covMcd
for a high-breakdown (but more computer
intensive) method;
BACON
for a “generalization”, notably to
regression.
Examples
require(robustbase) # for example data and covMcd():
## simple 2D example :
plot(starsCYG, main = "starsCYG data (n=47)")
B.st <- mvBACON(starsCYG)
points(starsCYG[ ! B.st$subset,], pch = 4, col = 2, cex = 1.5)
stopifnot(identical(which(!B.st$subset), c(7L,11L,20L,30L,34L)))
## finds the 4 clear outliers (and 1 "borderline");
## it does not find obs. 14 which is an outlier according to covMcd(.)
iniS <- setNames(, eval(formals(mvBACON)$init.sel)) # all initialization methods, incl "random"
set.seed(123)
Bs.st <- lapply(iniS[iniS != "manual"], function(s)
mvBACON(as.matrix(starsCYG), init.sel = s, verbose=FALSE))
ii <- - match("steps", names(Bs.st[[1]]))
Bs.s1 <- lapply(Bs.st, `[`, ii)
stopifnot(exprs = {
length(Bs.s1) >= 4
length(unique(Bs.s1)) == 1 # all 4 methods give the same
})
## Example where "dUniMedian" and "V2" differ :
data(pulpfiber, package="robustbase")
dU.plp <- mvBACON(as.matrix(pulpfiber), init.sel = "dUniMedian")
V2.plp <- mvBACON(as.matrix(pulpfiber), init.sel = "V2")
(oU <- which(! dU.plp$subset))
(o2 <- which(! V2.plp$subset))
stopifnot(setdiff(o2, oU) %in% c(57L,58L,59L,62L))
## and 57, 58, 59, and 62 *are* outliers according to covMcd(.)
## 'coleman' from pkg 'robustbase'
coleman.x <- data.matrix(coleman[, 1:6])
Cc <- covMcd (coleman.x) # truly robust
summary(Cc) # -> 6 outliers (1,3,10,12,17,18)
Cb1 <- mvBACON(coleman.x) ##-> subset is all TRUE hmm??
Cb2 <- mvBACON(coleman.x, init.sel = "dUniMedian")
stopifnot(all.equal(Cb1, Cb2))
## try 20 different random starts:
Cb.r <- lapply(1:20, function(i) { set.seed(i)
mvBACON(coleman.x, init.sel="random", verbose=FALSE) })
nm <- names(Cb.r[[1]]); nm <- nm[nm != "steps"]
all(eqC <- sapply(Cb.r[-1], function(CC) all.equal(CC[nm], Cb.r[[1]][nm]))) # TRUE
## --> BACON always breaks down, i.e., does not see the outliers here
## breaks down even when manually starting with all the non-outliers:
Cb.man <- mvBACON(coleman.x, init.sel = "manual",
man.sel = setdiff(1:20, c(1,3,10,12,17,18)))
which( ! Cb.man$subset) # the outliers according to mvBACON : _none_