xtrData {robustbase} | R Documentation |
Extreme Data examples
Description
x30o50
, called ‘'XX'’ in the thesis, has been a running
case for which mc()
had failed to converge.
A numeric vector of 50 values, 30 of which are very close to zero,
specifically, their absolute values are less than 1.5e-15
.
The remaining 20 values (11 negative, 9 positive) have absolute values between 0.0022 and 1.66.
Usage
data(x30o50, package="robustbase")
Format
A summary is
Min. 1st Qu. Median Mean 3rd Qu. Max. -1.66006 0.00000 0.00000 -0.04155 0.00000 1.29768
notably the 1st to 3rd quartiles are all very close to zero.
Details
a good robust method will treat the 60% “almost zero” values as “good” data and all other as outliers.
This is somewhat counter intuitive to typical human perception where the 30 almost-zero numbers would be considered as inliers and the remaining 20 as “good” data.
The original mc()
algorithm and also the amendments up to 2022
(robustbase versions before 0.95) would fail to converge unless (in
newer versions) eps1
was increased, e.g., only by a factor of 10,
to eps1 = 1e-13
.
References
Lukas Graz (2021); unpublished BSc thesis, see mc
.
Examples
data(x30o50)
## have 4 duplicated values :
table(dX <- duplicated(x30o50))
x30o50[dX] # 0 2.77e-17 4.16e-17 2.08e-16
sort(x30o50[dX]) * 2^56 # 0 2 3 15
## and they are c(0,2,3,15)*2^-56
table(sml <- abs(x30o50) < 1e-11)# 20 30
summary(x30o50[ sml]) # -1.082e-15 ... 1.499e-15 ; mean = 9.2e-19 ~~ 0
summary(x30o50[!sml])
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.6601 -0.4689 -0.0550 -0.1039 0.3986 1.2977
op <- par(mfrow=c(3,1), mgp=c(1.5, .6, 0), mar = .3+c(2,3:1))
(Fn. <- ecdf(x30o50)) # <- only 46 knots (as have 4 duplications)
plot(Fn.) ## and zoom in (*drastically*) to around x=0 :
for(f in c(1e-13, 1.5e-15)) {
plot(Fn., xval=f*seq(-1,1, length.out = 1001), ylim=c(0,1), main="[zoomed in]")
if(f == 1e-13) rect(-1e-15,0, +1e-15, 1, col="thistle", border=1)
plot(Fn., add=TRUE)
}
par(op)
mcOld <- function(x, ..., doScale=TRUE) mc(x, doScale=doScale, c.huberize=Inf, ...)
try( mcOld(x30o50) ) # Error: .. not 'converged' in 100 iteration
mcOld(x30o50, eps1 = 1e-12) # -0.152
(mcX <- mc(x30o50)) # -7.10849e-13
stopifnot(exprs = {
all.equal(-7.10848988e-13, mcX, tol = 1e-9)
all.equal(mcX, mc(1e30*x30o50), tol = 4e-4) # not so close
})
table(sml <- abs(x30o50) < 1e-8)# 20 30
range(x30o50[sml])
x0o50 <- x30o50; x0o50[sml] <- 0
(mcX0 <- mc(x0o50))
stopifnot(exprs = {
all.equal(-0.378445401788, mcX0, tol=1e-12)
all.equal(-0.099275805349, mc(x30o50[!sml]) -> mcL, tol=2e-11)
all.equal(mcL, mcOld(x30o50[!sml]))
})
## -- some instability also wrt c.huberize:
mcHubc <- function(dat, ...)
function(cc) vapply(cc, function(c) mc(dat, c.huberize = c, ...), -1.)
mcH50 <- mcHubc(x30o50)
head(cHs <- c(sort(outer(c(1, 2, 5), 10^(2:15))), Inf), 9)
mcXc <- mcH50(cHs)
plot( mcXc ~ cHs, type="b", log="x" , xlab=quote(c[huberize]))
plot((-mcXc) ~ cHs, type="b", log="xy", xlab=quote(c[huberize]))
## but for "regular" outlier skew data, there's no such dependency:
mcXcu <- mcHubc(cushny)(cHs)
stopifnot( abs(mcXcu - mcXcu[1]) < 1e-15)