R: Optimally robust estimation

roptest {ROptEst}

R Documentation

Optimally robust estimation

Description

Function to compute optimally robust estimates for L2-differentiable parametric families via k-step construction.

Usage

roptest(x, L2Fam, eps, eps.lower, eps.upper, fsCor = 1, initial.est, 
        neighbor = ContNeighborhood(), risk = asMSE(), steps = 1L, 
        distance = CvMDist, startPar = NULL, verbose = NULL,
        OptOrIter = "iterate",
        useLast = getRobAStBaseOption("kStepUseLast"),
        withUpdateInKer = getRobAStBaseOption("withUpdateInKer"),
        IC.UpdateInKer = getRobAStBaseOption("IC.UpdateInKer"),
        withICList = getRobAStBaseOption("withICList"),
        withPICList = getRobAStBaseOption("withPICList"),
        na.rm = TRUE, initial.est.ArgList, ...,
        withLogScale = TRUE, ..withCheck = FALSE, withTimings = FALSE,
        withMDE = NULL, withEvalAsVar = NULL, withMakeIC = FALSE,
        modifyICwarn = NULL, E.argList = NULL, diagnostic = FALSE)
roptest.old(x, L2Fam, eps, eps.lower, eps.upper, fsCor = 1, initial.est,
        neighbor = ContNeighborhood(), risk = asMSE(), steps = 1L,
        distance = CvMDist, startPar = NULL, verbose = NULL,
        OptOrIter = "iterate",
        useLast = getRobAStBaseOption("kStepUseLast"),
        withUpdateInKer = getRobAStBaseOption("withUpdateInKer"),
        IC.UpdateInKer = getRobAStBaseOption("IC.UpdateInKer"),
        withICList = getRobAStBaseOption("withICList"),
        withPICList = getRobAStBaseOption("withPICList"),
        na.rm = TRUE, initial.est.ArgList, ...,
        withLogScale = TRUE)

Arguments

`x`	sample
`L2Fam`	object of class `"L2ParamFamily"`
`eps`	positive real (0 < `eps` <= 0.5): amount of gross errors. See details below.
`eps.lower`	positive real (0 <= `eps.lower` <= `eps.upper`): lower bound for the amount of gross errors. See details below.
`eps.upper`	positive real (`eps.lower` <= `eps.upper` <= 0.5): upper bound for the amount of gross errors. See details below.
`fsCor`	positive real: factor used to correct the neighborhood radius; see details.
`initial.est`	initial estimate for unknown parameter. If missing, a minimum distance estimator is computed.
`neighbor`	object of class `"UncondNeighborhood"`
`risk`	object of class `"RiskType"`
`steps`	positive integer: number of steps used for k-steps construction
`distance`	distance function used in `MDEstimator`, which in turn is used as (default) starting estimator.
`startPar`	initial information used by `optimize` resp. `optim`; i.e; if (total) parameter is of length 1, `startPar` is a search interval, else it is an initial parameter value; if `NULL` slot `startPar` of `ParamFamily` is used to produce it; in the multivariate case, `startPar` may also be of class `Estimate`, in which case slot `untransformed.estimate` is used.
`verbose`	logical: if `TRUE`, some messages are printed
`useLast`	which parameter estimate (initial estimate or k-step estimate) shall be used to fill the slots `pIC`, `asvar` and `asbias` of the return value.
`OptOrIter`	character; which method to be used for determining Lagrange multipliers `A` and `a`: if (partially) matched to `"optimize"`, `getLagrangeMultByOptim` is used; otherwise: by default, or if matched to `"iterate"` or to `"doubleiterate"`, `getLagrangeMultByIter` is used. More specifically, when using `getLagrangeMultByIter`, and if argument `risk` is of class `"asGRisk"`, by default and if matched to `"iterate"` we use only one (inner) iteration, if matched to `"doubleiterate"` we use up to `Maxiter` (inner) iterations.
`withUpdateInKer`	if there is a non-trivial trafo in the model with matrix `D`, shall the parameter be updated on `{\rm ker}(D)`?
`IC.UpdateInKer`	if there is a non-trivial trafo in the model with matrix `D`, the IC to be used for this; if `NULL` the result of `getboundedIC(L2Fam,D)` is taken; this IC will then be projected onto `{\rm ker}(D)`.
`withPICList`	logical: shall slot `pICList` of return value be filled?
`withICList`	logical: shall slot `ICList` of return value be filled?
`na.rm`	logical: if `TRUE`, the estimator is evaluated at `complete.cases(x)`.
`initial.est.ArgList`	a list of arguments to be given to argument `start` if the latter is a function; this list by default already starts with two unnamed items, the sample `x`, and the model `L2Fam`.
`...`	further arguments
`withLogScale`	logical; shall a scale component (if existing and found with name `scalename`) be computed on log-scale and backtransformed afterwards? This avoids crossing 0.
`..withCheck`	logical: if `TRUE`, debugging info is issued.
`withTimings`	logical: if `TRUE`, separate (and aggregate) timings for the three steps evaluating the starting value, finding the starting influence curve, and evaluating the k-step estimator is issued.
`withMDE`	logical or `NULL`: Shall a minimum distance estimator be used as starting estimator—in addition to the function given in slot `startPar` of the L2 family? If `NULL` (default), the content of slot `.withMDE` in the L2 family is used instead to take this decision.
`withEvalAsVar`	logical or `NULL`: if `TRUE` (default), tells R to evaluate the asymptotic variance or if `FALSE` just to produces a call to do so. If `withEvalAsVar` is `NULL` (default), the content of slot `.withEvalAsVar` in the L2 family is used instead to take this decision.
`withMakeIC`	logical; if `TRUE` the [p]IC is passed through `makeIC` before return.
`modifyICwarn`	logical: should a (warning) information be added if `modifyIC` is applied and hence some optimality information could no longer be valid? Defaults to `NULL` in which case this value is taken from `RobAStBaseOptions`.
`E.argList`	`NULL` (default) or a list of arguments to be passed to calls to `E` from (a) `MDEstimator` (here this additional argument is only used if `initial.est` is missing), (b) `getStartIC`, and (c) `kStepEstimator`. Potential clashes with arguments of the same name in `...` are resolved by inserting the items of argument list `E.argList` as named items, so in case of collisions the item of `E.argList` overwrites the existing one from `...`.
`diagnostic`	logical; if `TRUE`, diagnostic information on the performed integrations is gathered and shipped out as attributes `kStepDiagnostic` (for the kStepEstimator-step) and `diagnostic` for the remaining steps of the return value of `roptest`.

Details

Computes the optimally robust estimator for a given L2 differentiable parametric family. The computation uses a k-step construction with an appropriate initial estimate; cf. also kStepEstimator. Valid candidates are e.g. Kolmogorov(-Smirnov) or von Mises minimum distance estimators (default); cf. Rieder (1994) and Kohl (2005).

Before package version 0.9, this computation was done with the code of function roptest.old (with the same formals). From package version 0.9 on, this function uses the modularized function robest internally.

If the amount of gross errors (contamination) is known, it can be specified by eps. The radius of the corresponding infinitesimal contamination neighborhood is obtained by multiplying eps by the square root of the sample size.

If the amount of gross errors (contamination) is unknown, try to find a rough estimate for the amount of gross errors, such that it lies between eps.lower and eps.upper.

In case eps.lower is specified and eps.upper is missing, eps.upper is set to 0.5. In case eps.upper is specified and eps.lower is missing, eps.lower is set to 0.

If neither eps nor eps.lower and/or eps.upper is specified, eps.lower and eps.upper are set to 0 and 0.5, respectively.

If eps is missing, the radius-minimax estimator in sense of Rieder et al. (2001, 2008), respectively Section 2.2 of Kohl (2005) is returned.

Finite-sample and higher order results suggest that the asymptotically optimal procedure is to liberal. Using fsCor the radius can be modified - as a rule enlarged - to obtain a more conservative estimate. In case of normal location and scale there is function finiteSampleCorrection which returns a finite-sample corrected (enlarged) radius based on the results of large Monte-Carlo studies.

The logic in argument initial.est is as follows: It can be a numeric vector of the length of the unknow parameter or a function or it can be missing. If it is missing, one consults argument startPar for a search interval (if a one dimensional unknown parameter) or a starting value for the search (if the dimension of the unknown parameter is larger than one). If startPar is missing, too, it takes the value from the corresponding slot of argument L2Fam. Then, if argument withMDE is TRUE a Minimum-Distance estimator is computed as initial value initial.est with distance as specified in argument distance and possibly further arguments as passed through ....

In the next step, the value of initial.est (either if not missing from beginning or as computed through the MDE) is then passed on to kStepEstimator.start which then takes out the essential information for the sequel, i.e., a numeric vector of the estimate.

At this initial value the optimal influence curve is computed through interface getStartIC, which in turn, depending on the risk calls optIC, radiusMinimaxIC, or computes the IC from precomputed grid values in case of risk being of class interpolRisk. With the obtained optimal IC, kStepEstimator is called.

The default value of argument useLast is set by the global option kStepUseLast which by default is set to FALSE. In case of general models useLast remains unchanged during the computations. However, if slot CallL2Fam of IC generates an object of class "L2GroupParamFamily" the value of useLast is changed to TRUE. Explicitly setting useLast to TRUE should be done with care as in this situation the influence curve is re-computed using the value of the one-step estimate which may take quite a long time depending on the model.

If useLast is set to TRUE the computation of asvar, asbias and IC is based on the k-step estimate.

Timings for the steps run through in roptest are available in attributes timings, and for the step of the kStepEstimator in kStepTimings.

One may also use the arguments startCtrl, startICCtrl, and kStepCtrl of function robest. This allows for individual settings of E.argList, withEvalAsVar, and withMakeIC for the different steps. If any of the three arguments startCtrl, startICCtrl, and kStepCtrl is used, the respective attributes set in the correspondig argument are used and, if colliding with arguments directly passed to roptest, the directly passed ones are ignored.

Diagnostics on the involved integrations are available if argument diagnostic is TRUE. Then there are attributes diagnostic and kStepDiagnostic attached to the return value, which may be inspected and assessed through showDiagnostic and getDiagnostic.

Value

Object of class "kStepEstimate". In addition, it has an attribute "timings" where computation time is stored.

Author(s)

Matthias Kohl Matthias.Kohl@stamats.de,
Peter Ruckdeschel peter.ruckdeschel@uni-oldenburg.de

References

Kohl, M. (2005) Numerical Contributions to the Asymptotic Theory of Robustness. Bayreuth: Dissertation. https://epub.uni-bayreuth.de/id/eprint/839/2/DissMKohl.pdf.

Kohl, M. and Ruckdeschel, P. (2010): R package distrMod: Object-Oriented Implementation of Probability Models. J. Statist. Softw. 35(10), 1–27. doi:10.18637/jss.v035.i10.

Kohl, M. and Ruckdeschel, P., and Rieder, H. (2010): Infinitesimally Robust Estimation in General Smoothly Parametrized Models. Stat. Methods Appl., 19, 333–354. doi:10.1007/s10260-010-0133-0.

Rieder, H. (1994) Robust Asymptotic Statistics. New York: Springer. doi:10.1007/978-1-4684-0624-5.

Rieder, H., Kohl, M. and Ruckdeschel, P. (2008) The Costs of not Knowing the Radius. Statistical Methods and Applications 17(1) 13-40. doi:10.1007/s10260-007-0047-7.

Rieder, H., Kohl, M. and Ruckdeschel, P. (2001) The Costs of not Knowing the Radius. Appeared as discussion paper Nr. 81. SFB 373 (Quantification and Simulation of Economic Processes), Humboldt University, Berlin; also available under doi:10.18452/3638

Examples

## Don't run to reduce check time on CRAN
## Not run: 
#############################
## 1. Binomial data
#############################
## generate a sample of contaminated data
set.seed(123)
ind <- rbinom(100, size=1, prob=0.05)
x <- rbinom(100, size=25, prob=(1-ind)*0.25 + ind*0.9)

## ML-estimate
MLest <- MLEstimator(x, BinomFamily(size = 25))
estimate(MLest)
confint(MLest)

## compute optimally robust estimator (known contamination)
robest1 <- roptest(x, BinomFamily(size = 25), eps = 0.05, steps = 3)
robest1.0 <- roptest.old(x, BinomFamily(size = 25), eps = 0.05, steps = 3)
identical(robest1,robest1.0)
estimate(robest1)
confint(robest1, method = symmetricBias())
## neglecting bias
confint(robest1)
plot(pIC(robest1))
tmp <- qqplot(x, robest1, cex.pch=1.5, exp.cex2.pch = -.25,
              exp.fadcol.pch = .55, jit.fac=.9)

## compute optimally robust estimator (unknown contamination)
robest2 <- roptest(x, BinomFamily(size = 25), eps.lower = 0, eps.upper = 0.2, steps = 3)
estimate(robest2)
confint(robest2, method = symmetricBias())
plot(pIC(robest2))

## total variation neighborhoods (known deviation)
robest3 <- roptest(x, BinomFamily(size = 25), eps = 0.025, 
                   neighbor = TotalVarNeighborhood(), steps = 3)
estimate(robest3)
confint(robest3, method = symmetricBias())
plot(pIC(robest3))

## total variation neighborhoods (unknown deviation)
robest4 <- roptest(x, BinomFamily(size = 25), eps.lower = 0, eps.upper = 0.1, 
                   neighbor = TotalVarNeighborhood(), steps = 3)
estimate(robest4)
confint(robest4, method = symmetricBias())
plot(pIC(robest4))

#############################
## 2. Poisson data
#############################
## Example: Rutherford-Geiger (1910); cf. Feller~(1968), Section VI.7 (a)
x <- c(rep(0, 57), rep(1, 203), rep(2, 383), rep(3, 525), rep(4, 532), 
       rep(5, 408), rep(6, 273), rep(7, 139), rep(8, 45), rep(9, 27), 
       rep(10, 10), rep(11, 4), rep(12, 0), rep(13, 1), rep(14, 1))

## ML-estimate
MLest <- MLEstimator(x, PoisFamily())
estimate(MLest)
confint(MLest)

## compute optimally robust estimator (unknown contamination)
robest <- roptest(x, PoisFamily(), eps.upper = 0.1, steps = 3)
estimate(robest)
confint(robest, symmetricBias())

plot(pIC(robest))
tmp <- qqplot(x, robest, cex.pch=1.5, exp.cex2.pch = -.25,
              exp.fadcol.pch = .55, jit.fac=.9)
 
## total variation neighborhoods (unknown deviation)
robest1 <- roptest(x, PoisFamily(), eps.upper = 0.05, 
                  neighbor = TotalVarNeighborhood(), steps = 3)
estimate(robest1)
confint(robest1, symmetricBias())
plot(pIC(robest1))

## End(Not run)

#############################
## 3. Normal (Gaussian) location and scale
#############################

## this example of a two dimensional parameter
## to be estimated will need more time than 
## 5 seconds to run 
## you can find it in 
## system.file("scripts", "examples_taking_longer.R", 
##              package="ROptEst")

[Package ROptEst version 1.3.3 Index]