mvBACON {robustX}R Documentation

BACON: Blocked Adaptive Computationally-Efficient Outlier Nominators


This function performs an outlier identification algorithm to the data in the x array [n x p] and y vector [n] following the lines described by Hadi et al. for their BACON outlier procedure.


mvBACON(x, collect = 4, m = min(collect * p, n * 0.5), alpha = 0.05,
        init.sel = c("Mahalanobis", "dUniMedian", "random", "manual", "V2"),
        man.sel, maxsteps = 100, allowSingular = FALSE, verbose = TRUE)



numeric matrix (of dimension [nxp][n x p]), not supposed to contain missing values.


a multiplication factor cc, when init.sel is not "manual", to define mm, the size of the initial basic subset, as m:=cpm := c \cdot p, in practice, m <- min(p * collect, n/2).


integer in 1:n specifying the size of the initial basic subset; used only when init.sel is not "manual".


determines the cutoff value for the Mahalanobis distances (see details).


character string, specifying the initial selection mode; implemented modes are:


based on Mahalanobis distances (default); the version V1V1 of the reference; affine invariant but not robust.


based on the distances from the univariate medians; similar to the version V2V2 of the reference; robust but not affine invariant.


based on a random selection, i.e., reproducible only via set.seed().


based on manual selection; in this case, a vector man.sel containing the indices of the selected observations must be specified.


based on the Euclidean norm from the univariate medians; this is the version V2V2 of the reference; robust but not affine invariant.

"Mahalanobis" and "V2" where proposed by Hadi and the other authors in the reference as versions ‘V_1’ and ‘V_2’, as well as "manual", while "random" is provided in order to study the behaviour of BACON. Option "dUniMedian" is similar to "V2" and is due to U. Oetliker.


only when init.sel == "manual", the indices of observations determining the initial basic subset (and m <- length(man.sel)).


maximal number of iteration steps.


logical indicating a solution should be sought also when no matrix of rank pp is found.


logical indicating if messages are printed which trace progress of the algorithm.


Remarks on the tuning parameter alpha: Let χp2\chi^2_p be a chi-square distributed random variable with pp degrees of freedom (pp is the number of variables; nn is the number of observations). Denote the (1α)(1-\alpha) quantile by χp2(α)\chi^2_p(\alpha), e.g., χp2(0.05)\chi^2_p(0.05) is the 0.95 quantile. Following Billor et al. (2000), the cutoff value for the Mahalanobis distances is defined as χp(α/n)\chi_p(\alpha/n) (the square root of chip2chi^2_p) times a correction factor c(n,p)c(n,p), nn and pp, and they use α=0.05\alpha=0.05.


a list with components


logical vector of length n where the i-th entry is true iff the i-th observation is part of the final selection.


numeric vector of length n with the (Mahalanobis) distances.


p×pp \times p matrix, the corresponding robust estimate of covariance.


Ueli Oetliker, Swiss Federal Statistical Office, for S-plus 5.1. Port to R, testing etc, by Martin Maechler; Init selection "V2" and correction of default alpha from 0.95 to 0.05, by Tobias Schoch, FHNW Olten, Switzerland.


Billor, N., Hadi, A. S., and Velleman , P. F. (2000). BACON: Blocked Adaptive Computationally-Efficient Outlier Nominators; Computational Statistics and Data Analysis 34, 279–298. doi:10.1016/S0167-9473(99)00101-2

See Also

covMcd for a high-breakdown (but more computer intensive) method; BACON for a “generalization”, notably to regression.


 require(robustbase) # for example data and covMcd():
 ## simple 2D example :
 plot(starsCYG, main = "starsCYG  data  (n=47)") <- mvBACON(starsCYG)
 points(starsCYG[ !$subset,], pch = 4, col = 2, cex = 1.5)
 stopifnot(identical(which(!$subset), c(7L,11L,20L,30L,34L)))
 ## finds the 4 clear outliers (and 1 "borderline");
 ## it does not find obs. 14 which is an outlier according to covMcd(.)

 iniS <- setNames(, eval(formals(mvBACON)$init.sel)) # all initialization methods, incl "random"
 set.seed(123) <- lapply(iniS[iniS != "manual"], function(s)
                 mvBACON(as.matrix(starsCYG), init.sel = s, verbose=FALSE))
 ii <- - match("steps", names([[1]]))
 Bs.s1 <- lapply(, `[`, ii)
 stopifnot(exprs = {
    length(Bs.s1) >= 4
    length(unique(Bs.s1)) == 1 # all 4 methods give the same

 ## Example where "dUniMedian" and "V2" differ :
 data(pulpfiber, package="robustbase")
 dU.plp <- mvBACON(as.matrix(pulpfiber), init.sel = "dUniMedian")
 V2.plp <- mvBACON(as.matrix(pulpfiber), init.sel = "V2")
 (oU <- which(! dU.plp$subset))
 (o2 <- which(! V2.plp$subset))
 stopifnot(setdiff(o2, oU) %in% c(57L,58L,59L,62L))
 ## and 57, 58, 59, and 62 *are* outliers according to covMcd(.)

 ## 'coleman' from pkg 'robustbase'
 coleman.x <- data.matrix(coleman[, 1:6])
 Cc <- covMcd (coleman.x) # truly robust
 summary(Cc) # -> 6 outliers (1,3,10,12,17,18)
 Cb1 <- mvBACON(coleman.x) ##-> subset is all TRUE hmm??
 Cb2 <- mvBACON(coleman.x, init.sel = "dUniMedian")
 stopifnot(all.equal(Cb1, Cb2))
 ## try 20 different random starts:
 Cb.r <- lapply(1:20, function(i) { set.seed(i)
                     mvBACON(coleman.x, init.sel="random", verbose=FALSE) })
 nm <- names(Cb.r[[1]]); nm <- nm[nm != "steps"]
 all(eqC <- sapply(Cb.r[-1], function(CC) all.equal(CC[nm], Cb.r[[1]][nm]))) # TRUE
 ## --> BACON always  breaks down, i.e., does not see the outliers here
 ## breaks down even when manually starting with all the non-outliers: <- mvBACON(coleman.x, init.sel = "manual",
                   man.sel = setdiff(1:20, c(1,3,10,12,17,18)))
 which( !$subset) # the outliers according to mvBACON : _none_

[Package robustX version 1.2-7 Index]