R: Tuning Parameters for lmrob() and Auxiliaries

lmrob.control {robustbase}

R Documentation

Tuning Parameters for lmrob() and Auxiliaries

Description

Tuning parameters for lmrob, the MM-type regression estimator and the associated S-, M- and D-estimators. Using setting="KS2011" sets the defaults as suggested by Koller and Stahel (2011) and analogously for "KS2014".

The .M*.default functions and .M*.defaults lists contain default tuning parameters for all the predefined \psi functions, see also Mpsi, etc.

Usage

lmrob.control(setting, seed = NULL, nResample = 500,
              tuning.chi = NULL, bb = 0.5, tuning.psi = NULL,
              max.it = 50, groups = 5, n.group = 400,
              k.fast.s = 1, best.r.s = 2,
              k.max = 200, maxit.scale = 200, k.m_s = 20,
              refine.tol = 1e-7, rel.tol = 1e-7, scale.tol = 1e-10, solve.tol = 1e-7,
              zero.tol = 1e-10,
              trace.lev = 0,
              mts = 1000, subsampling = c("nonsingular", "simple"),
              compute.rd = FALSE, method = "MM", psi = "bisquare",
              numpoints = 10, cov = NULL,
              split.type = c("f", "fi", "fii"), fast.s.large.n = 2000,
              # only for outlierStats() :
              eps.outlier = function(nobs) 0.1 / nobs,
              eps.x = function(maxx) .Machine$double.eps^(.75)*maxx,
              compute.outlier.stats = method,
              warn.limit.reject = 0.5,
              warn.limit.meanrw = 0.5, ...)

## S3 method for class 'lmrobCtrl'
update(object, ...)

.Mchi.tuning.defaults
.Mchi.tuning.default(psi)
.Mpsi.tuning.defaults
.Mpsi.tuning.default(psi)

Arguments

`setting`	a string specifying alternative default values. Leave empty for the defaults or use `"KS2011"` or `"KS2014"` for the defaults suggested by Koller and Stahel (2011, 2017). See Details.
`seed`	`NULL` or an integer vector compatible with `.Random.seed`: the seed to be used for random re-sampling used in obtaining candidates for the initial S-estimator. The current value of `.Random.seed` will be preserved if `seed` is set, i.e. non-`NULL`; otherwise, as by default, `.Random.seed` will be used and modified as usual from calls to `runif()` etc.
`nResample`	number of re-sampling candidates to be used to find the initial S-estimator. Currently defaults to 500 which works well in most situations (see references).
`tuning.chi`	tuning constant vector for the S-estimator. If `NULL`, as by default, sensible defaults are set (depending on `psi`) to yield a 50% breakdown estimator. See Details.
`bb`	expected value under the normal model of the “chi” (rather `\rho (rho)`) function with tuning constant equal to `tuning.chi`. This is used to compute the S-estimator.
`tuning.psi`	tuning constant vector for the redescending M-estimator. If `NULL`, as by default, this is set (depending on `psi`) to yield an estimator with asymptotic efficiency of 95% for normal errors. See Details.
`max.it`	integer specifying the maximum number of IRWLS iterations.
`groups`	(for the fast-S algorithm): Number of random subsets to use when the data set is large.
`n.group`	(for the fast-S algorithm): Size of each of the `groups` above. Note that this must be at least `p`.
`k.fast.s`	(for the fast-S algorithm): Number of local improvement steps (“I-steps”) for each re-sampling candidate.
`k.m_s`	(for the M-S algorithm): specifies after how many unsuccessful refinement steps the algorithm stops.
`best.r.s`	(for the fast-S algorithm): Number of of best candidates to be iterated further (i.e., “refined”); is denoted `t` in Salibian-Barrera & Yohai(2006).
`k.max`	(for the fast-S algorithm): maximal number of refinement steps for the “fully” iterated best candidates.
`maxit.scale`	integer specifying the maximum number of C level `find_scale()` iterations (in fast-S and M-S algorithms).
`refine.tol`	(for the fast-S algorithm): relative convergence tolerance for the fully iterated best candidates.
`rel.tol`	(for the RWLS iterations of the MM algorithm): relative convergence tolerance for the parameter vector.
`scale.tol`	(for the scale estimation iterations of the S algorithm): relative convergence tolerance for the `scale` `\sigma(.)`.
`solve.tol`	(for the S algorithm): relative tolerance for inversion. Hence, this corresponds to `solve.default()`'s `tol`.
`zero.tol`	for checking 0-residuals in the S algorithm, non-negative number `\epsilon_z` such that `\{i; \left\|\tilde{R}_i\right\| \le \epsilon_z\}` correspond to `0`-residuals, where `\tilde{R}_i` are standardized residuals, `\tilde{R}_i = R_i/s_y` and `s_y = \frac{1}{n} \sum_{i=1}^n \left\|y_i\right\|`.
`trace.lev`	integer indicating if the progress of the MM-algorithm and the fast-S algorithms, see `lmrob.S`, should be traced (increasingly); default `trace.lev = 0` does no tracing.
`mts`	maximum number of samples to try in subsampling algorithm.
`subsampling`	type of subsampling to be used, a string: `"simple"` for simple subsampling (default prior to version 0.9), `"nonsingular"` for nonsingular subsampling. See also `lmrob.S`.
`compute.rd`	logical indicating if robust distances (based on the MCD robust covariance estimator `covMcd`) are to be computed for the robust diagnostic plots. This may take some time to finish, particularly for large data sets, and can lead to singularity problems when there are `factor` explanatory variables (with many levels, or levels with “few” observations). Hence, is `FALSE` by default.
`method`	string specifying the estimator-chain. `MM` is interpreted as `SM`. See Details of `lmrob` for a description of the possible values.
`psi`	string specifying the type `\psi`-function used. See Details of `lmrob`. Defaults to `"bisquare"` for S and MM-estimates, otherwise `"lqq"`.
`numpoints`	number of points used in Gauss quadrature.
`cov`	function or string with function name to be used to calculate covariance matrix estimate. The default is `if(method %in% c('SM', 'MM')) ".vcov.avar1" else ".vcov.w"`. See Details of `lmrob`.
`split.type`	determines how categorical and continuous variables are split. See `splitFrame`.
`fast.s.large.n`	minimum number of observations required to switch from ordinary “fast S” algorithm to an efficient “large n” strategy.
`eps.outlier`	limit on the robustness weight below which an observation is considered to be an outlier. Either a `numeric(1)` or a function that takes the number of observations as an argument. Used only in `summary.lmrob` and `outlierStats`.
`eps.x`	limit on the absolute value of the elements of the design matrix below which an element is considered zero. Either a `numeric(1)` or a function that takes the maximum absolute value in the design matrix as an argument.
`compute.outlier.stats`	vector of `character` strings, each valid to be used as `method` argument. Used to specify for which estimators outlier statistics (and warnings) should be produced. Set to empty (`NULL` or `character(0)`) if none are required. Note that the default is `method` which by default is either `"MM"`, `"SM"`, or `"SMDM"`; hence using `compute.outlier.stats = "S"` provides `outlierStats()` to a `lmrob.S()` result.
`warn.limit.reject`	limit of ratio `\#\mbox{rejected} / \#\mbox{obs in level}` above (`\geq`) which a warning is produced. Set to `NULL` to disable warning.
`warn.limit.meanrw`	limit of the mean robustness per factor level below which (`\leq`) a warning is produced. Set to `NULL` to disable warning.
`object`	an `"lmrobCtrl"` object, as resulting from a `lmrob.control()` or an `update(<lmrobCtrl>, )` call.
`...`	for `lmrob.control()`: further arguments to be added as `list` components to the result, e.g., those to be used in `.vcov.w()`. `update(object, )`: (named) components from `object`, to be modified, not* `setting = *`.

Details

The option setting="KS2011" alters the default arguments. They are changed to method = "SMDM", psi = "lqq", max.it = 500, k.max = 2000, cov = ".vcov.w". The defaults of all the remaining arguments are not changed.

The option setting="KS2014" builds upon setting="KS2011". More arguments are changed to best.r.s = 20, k.fast.s = 2, nResample = 1000. This setting should produce more stable estimates for designs with factors.

By default, and in .Mpsi.tuning.default() and .Mchi.tuning.default(), tuning.chi and tuning.psi are set to yield an MM-estimate with breakdown point 0.5 and efficiency of 95% at the normal.

If numeric tuning.chi or tuning.psi are specified, say cc, for psi = "ggw" or "lqq", .psi.const(cc, psi) is used, see its help page.

To get the defaults, e.g., .Mpsi.tuning.default(psi) is equivalent to but more efficient than the formerly widely used lmrob.control(psi = psi)$tuning.psi.

These defaults are:

`psi`	`tuning.chi`	`tuning.psi`
`bisquare`	`1.54764`	`4.685061`
`welsh`	`0.5773502`	`2.11`
`ggw`	`c(-0.5, 1.5, NA, 0.5)`	`c(-0.5, 1.5, 0.95, NA)`
`lqq`	`c(-0.5, 1.5, NA, 0.5)`	`c(-0.5, 1.5, 0.95, NA)`
`optimal`	`0.4047`	`1.060158`
`hampel`	`c(1.5, 3.5, 8)*0.2119163`	`c(1.5, 3.5, 8)*0.9014`

The values for the tuning constant for the ggw and lqq psi functions are specified differently here by a vector with four elements: minimal slope, b (controlling the bend at the maximum of the curve), efficiency, breakdown point. Use NA for an unspecified value of either efficiency or breakdown point, see examples in the tables (above and below). For these table examples, the respective “inner constants” are stored precomputed, see .psi.lqq.findc for more.

The constants for the "hampel" psi function are chosen to have a redescending slope of -1/3. Constants for a slope of -1/2 would be

`psi`	`tuning.chi`	`tuning.psi`
`"hampel"`	`c(2, 4, 8) * 0.1981319`	`c(2, 4, 8) * 0.690794`

Alternative coefficients for an efficiency of 85% at the normal are given in the table below.

`psi`	`tuning.psi`
`bisquare`	`3.443689`
`welsh`	`1.456`
`ggw`, `lqq`	`c(-0.5, 1.5, 0.85, NA)`
`optimal`	`0.8684`
`hampel` (-1/3)	`c(1.5, 3.5, 8)* 0.5704545`
`hampel` (-1/2)	`c( 2, 4, 8) * 0.4769578`

Value

.Mchi.tuning.default(psi) and .Mpsi.tuning.default(psi) return a short numeric vector of tuning constants which are defaults for the corresponding psi-function, see the Details. They are based on the named lists .Mchi.tuning.defaults and .Mpsi.tuning.defaults, respectively.

lmrob.control() returns a named list with over twenty components, corresponding to the arguments, where tuning.psi and tuning.chi are typically computed, as .Mpsi.tuning.default(psi) or .Mchi.tuning.default(psi), respectively. It is of class "lmrobCtrl" and we provide print(), update() and within methods.

update(<lmrobCtrl>, ....) does not allow a setting="<...>" in .....

Author(s)

Matias Salibian-Barrera, Martin Maechler and Manuel Koller

References

Koller, M. and Stahel, W.A. (2011) Sharpening Wald-type inference in robust regression for small samples. Computational Statistics & Data Analysis 55(8), 2504–2515.

Koller, M. and Stahel, W.A. (2017) Nonsingular subsampling for regression S estimators with categorical predictors, Computational Statistics 32(2): 631–646. doi:10.1007/s00180-016-0679-x. Referred as "KS2014" everywhere in robustbase; A shorter first version, Koller (2012) has been available from https://arxiv.org/abs/1208.5595.

Examples

## Show the default settings:
str(lmrob.control())

## Artificial data for a  simple  "robust t test":
set.seed(17)
y <- y0 <- rnorm(200)
y[sample(200,20)] <- 100*rnorm(20)
gr <- as.factor(rbinom(200, 1, prob = 1/8))
lmrob(y0 ~ 0+gr)

## Use  Koller & Stahel(2011)'s recommendation but a larger  'max.it':
str(ctrl <- lmrob.control("KS2011", max.it = 1000))

str(.Mpsi.tuning.defaults)
stopifnot(identical(.Mpsi.tuning.defaults,
                   sapply(names(.Mpsi.tuning.defaults),
                          .Mpsi.tuning.default)))
## Containing (names!) all our (pre-defined) redescenders:
str(.Mchi.tuning.defaults)

## Difference between settings:
Cdef <- lmrob.control()
C11 <- lmrob.control("KS2011")
C14 <- lmrob.control("KS2014")
str(C14)
## Differences:
diffD <- names(which(!mapply(identical, Cdef,C11, ignore.environment=TRUE)))
diffC <- names(which(!mapply(identical, C11, C14, ignore.environment=TRUE)))
## KS2011 vs KS2014:  Apart from `setting` itself, they only differ in three places:
cbind(KS11 = unlist(C11[diffC[-1]]),
      KS14 = unlist(C14[diffC[-1]]))
##           KS11 KS14
## nResample  500 1000
## best.r.s     2   20
## k.fast.s     1    2
## default vs KS2011: a bit more: setting + 8
str2simpLang <-  function(x) {
    r <- if(is.null(x)) quote((NULL)) else str2lang(deparse(x))
    if(is.call(r)) format(r) else r
}
cbind(deflt= lapply(Cdef[diffD], str2simpLang),
      KS11 = lapply(C11 [diffD], str2simpLang))

## update()ing a lmrob.control() , e.g.,
C14mod <- update(C14, trace.lev = 2) # the same as
C14m.d <- C14; C14m.d$trace.lev <- 2
stopifnot(identical(C14mod, C14m.d))
## changing psi --> updates tuning.{psi,chi}:
C14mp <- update(C14, psi = "hampel", seed=101)
## updating 'method' is "smart" :
C.SMDM <- update(Cdef, method="SMDM")
all.equal(Cdef, C.SMDM) # changed also psi, tuning.{psi,chi} and cov !
chgd <- c("method", "psi", "tuning.chi",  "tuning.psi", "cov")
str(Cdef  [chgd])
str(C.SMDM[chgd])
C14m <- update(C14, method="SMM")
(ae <- all.equal(C14, C14mp))# changed tuning.psi & tuning.chi, too
stopifnot(exprs = {
    identical(C14, update(C14, method="SMDM")) # no change!
    identical(c("psi", "seed", "tuning.chi", "tuning.psi"),
              sort(gsub("[^.[:alpha:]]", "", sub(":.*", "", sub("^Component ", "", ae)))))
    identical(C14m, local({C <- C14; C$method <- "SMM"; C}))
})
##
try( update(C14, setting="KS2011") ) #--> Error: .. not allowed

[Package robustbase version 0.99-3 Index]