R: Robust Fitting of Nonlinear Regression Models

nlrob {robustbase}

R Documentation

Robust Fitting of Nonlinear Regression Models

Description

nlrob fits a nonlinear regression model by robust methods. Per default, by an M-estimator, using iterated reweighted least squares (called “IRLS” or also “IWLS”).

Usage

nlrob(formula, data, start, lower, upper,
      weights = NULL, na.action = na.fail,
      method = c("M", "MM", "tau", "CM", "mtl"),
      psi = .Mwgt.psi1("huber", cc=1.345), scale = NULL,
      test.vec = c("resid", "coef", "w"), maxit = 20,
      tol = 1e-06, acc, algorithm = "default", doCov = FALSE, model = FALSE,
      control = if(method == "M") nls.control() else
		nlrob.control(method, optArgs = list(trace=trace), ...),
      trace = FALSE, ...)

## S3 method for class 'nlrob'
fitted(object, ...)
## S3 method for class 'nlrob'
residuals(object, type = , ...)
## S3 method for class 'nlrob'
predict(object, newdata, ...)

Arguments

`formula`	a nonlinear `formula` including variables and parameters of the model, such as `y ~ f(x, theta)` (cf. `nls`). (For some checks: if `f(.)` is linear, then we need parentheses, e.g., `y ~ (a + b * x)`; (note that `._nlrob.w` is not allowed as variable or parameter name))
`data`	an optional data frame containing the variables in the model. If not found in `data`, the variables are taken from `environment(formula)`, typically the environment from which `nlrob` is called.
`start`	a named numeric vector of starting parameters estimates, only for `method = "M"`.
`lower`, `upper`	numeric vectors of lower and upper bounds; if needed, will be replicated to be as long as the longest of `start`, `lower` or `upper`. For (the default) `method = "M"`, if the bounds are unspecified all parameters are assumed to be unconstrained; also, for method `"M"`, bounds can only be used with the `"port"` algorithm. They are ignored, with a warning, in cases they have no effect. For all other methods, currently these bounds must be specified as finite values, and one of them must have `names` matching the parameter names in `formula`. For methods `"CM"` and `"mtl"`, the bounds must additionally have an entry named `"sigma"` as that is determined simultaneously in the same optimization, and hence its `lower` bound must not be negative.
`weights`	an optional vector of weights to be used in the fitting process (for intrinsic weights, not the weights `w` used in the iterative (robust) fit). I.e., `sum(w * e^2)` is minimized with `e` = residuals, `e_i = y_i - f(xreg_i, \theta)`, where `f(x,\theta)` is the nonlinear function, and `w` are the robust weights from `resid * weights`.
`na.action`	a function which indicates what should happen when the data contain `NA`s. The default action is for the procedure to fail. If NAs are present, use `na.exclude` to have residuals with `length == nrow(data) == length(w)`, where `w` are the weights used in the iterative robust loop. This is better if the explanatory variables in `formula` are time series (and so the NA location is important). For this reason, `na.omit`, which leads to omission of cases with missing values on any required variable, is not suitable here since the residuals length is different from `nrow(data) == length(w)`.
`method`	a character string specifying which method to use. The default is `"M"`, for historical and back-compatibility reasons. For the other methods, primarily see `nlrob.algorithms`. "M" Computes an M-estimator, using `nls(, weights=)` iteratively (hence, IRLS) with weights equal to `\psi(r_i) / r_i`, where `r_i` is the i-the residual from the previous fit. "MM" Computes an MM-estimator, starting from `init`, either "S" or "lts". "tau" Computes a Tau-estimator. "CM" Computes a “Constrained M” (=: CM) estimator. "mtl" Compute as “Maximum Trimmed Likelihood” (=: MTL) estimator. Note that all methods but `"M"` are “random”, hence typically to be preceded by `set.seed()` in usage, see also `nlrob.algorithms`.
`psi`	a function (possibly by name) of the form `g(x, 'tuning constant(s)', deriv)` that for `deriv=0` returns `\psi(x)/x` and for `deriv=1` returns `\psi'(x)`. Note that tuning constants can not be passed separately, but directly via the specification of `psi`, typically via a simple `.Mwgt.psi1()` call as per default. Note that this has been a deliberately non-backcompatible change for robustbase version 0.90-0 (summer 2013 – early 2014).
`scale`	when not `NULL` (default), a positive number specifying a scale kept fixed during the iterations (and returned as `Scale` component).
`test.vec`	character string specifying the convergence criterion. The relative change is tested for residuals with a value of `"resid"` (the default), for coefficients with `"coef"`, and for weights with `"w"`.
`maxit`	maximum number of iterations in the robust loop.
`tol`	non-negative convergence tolerance for the robust fit.
`acc`	previous name for `tol`, now deprecated.
`algorithm`	character string specifying the algorithm to use for `nls`, see there, only when `method = "M"`. The default algorithm is a Gauss-Newton algorithm.
`doCov`	a logical specifying if `nlrob()` should compute the asymptotic variance-covariance matrix (see `vcov`) already. This used to be hard-wired to `TRUE`; however, the default has been set to `FALSE`, as `vcov(obj)` and `summary(obj)` can easily compute it when needed.
`model`	a `logical` indicating if the `model.frame` should be returned as well.
`control`	an optional list of control settings. for `method = "M"`: settings for `nls()`. See `nls.control` for the names of the settable control values and their effect. for all `method`s but `"M"`: a list, typically resulting from `nlrob.control(method, *)`.
`trace`	logical value indicating if a “trace” of the `nls` iteration progress should be printed. Default is `FALSE`. If `TRUE`, in each robust iteration, the residual sum-of-squares and the parameter values are printed at the conclusion of each `nls` iteration. When the `"plinear"` algorithm is used, the conditional estimates of the linear parameters are printed after the nonlinear parameters.
`object`	an R object of class `"nlrob"`, typically resulting from `nlrob(..)`.
`...`	for `nlrob`: only when `method` is not `"M"`, optional arguments for `nlrob.control`; for other functions: potentially optional arguments passed to the extractor methods.
`type`	a string specifying the type of residuals desired. Currently, `"response"` and `"working"` are supported.
`newdata`	a data frame (or list) with the same names as the original `data`, see e.g., `predict.nls`.

Details

For method = "M", iterated reweighted least squares (“IRLS” or “IWLS”) is used, calling nls(*, weights= .) where weights w_i are proportional to \psi(r_i/ \hat{\sigma}).

All other methods minimize differently, and work without nls. See nlrob.algorithms for details.

Value

nlrob() returns an object of S3 class "nlrob", for method = "M" also inheriting from class "nls", (see nls).

It is a list with several components; they are not documented yet, as some of them will probably change. Instead, rather use “accessor” methods, where possible: There are methods (at least) for the generic accessor functions summary(), coefficients() (aka coef()) fitted.values(), residuals(), sigma() and vcov(), the latter for the variance-covariance matrix of the estimated parameters, as returned by coef(), i.e., not including the variance of the errors. For nlrob() results, estimethod() returns the “estimation method”, which coincides with the method argument used.

residuals(.), by default type = "response", returns the residuals e_i, defined above as e_i = Y_i - f_(x_i, \hat\theta). These differ from the standardized or weighted residuals which, e.g., are assumed to be normally distributed, and a version of which is returned in working.residuals component.

Note

This function (with the only method "M") used to be named rnls and has been in package sfsmisc in the past, but been dropped there.

Author(s)

method = "M":: Andreas Ruckstuhl (inspired by rlm() and nls()), in July 1994 for S-plus.
Christian Sangiorgio did the update to R and corrected some errors, from June 2002 to January 2005, and Andreas contributed slight changes and the first methods in August 2005.
method = "MM", etc:: Originally all by Eduardo L. T. Conceicao, see nlrob.algorithms:

Since then, the help page, testing, more cleanup, new methods: Martin Maechler.

Examples

DNase1 <- DNase[ DNase$Run == 1, ]

## note that selfstarting models don't work yet % <<< FIXME !!!

##--- without conditional linearity ---

## classical
fmNase1 <- nls( density ~ Asym/(1 + exp(( xmid - log(conc) )/scal ) ),
                data = DNase1,
                start = list( Asym = 3, xmid = 0, scal = 1 ),
                trace = TRUE )
summary( fmNase1 )

## robust
RmN1  <- nlrob( density ~ Asym/(1 + exp(( xmid - log(conc) )/scal ) ),
                data = DNase1, trace = TRUE,
                start = list( Asym = 3, xmid = 0, scal = 1 ))
summary( RmN1 )

##--- using conditional linearity ---

## classical
fm2DNase1 <- nls( density ~ 1/(1 + exp(( xmid - log(conc) )/scal ) ),
                  data = DNase1,
                  start = c( xmid = 0, scal = 1 ),
                  alg = "plinear", trace = TRUE )
summary( fm2DNase1 )

## robust
frm2DNase1 <- nlrob(density ~ 1/(1 + exp(( xmid - log(conc) )/scal ) ),
                  data = DNase1, start = c( xmid = 0, scal = 1 ),
                  alg = "plinear", trace = TRUE )
summary( frm2DNase1 )
## Confidence for linear parameter is quite smaller than "Asym" above
c1 <- coef(summary(RmN1))
c2 <- coef(summary(frm2DNase1))
rownames(c2)[rownames(c2) == ".lin"] <- "Asym"
stopifnot(all.equal(c1[,1:2], c2[rownames(c1), 1:2], tol = 0.09)) # 0.07315

### -- new examples -- "moderate outlier":
DN2 <- DNase1
DN2[10,"density"] <- 2*DN2[10,"density"]

fm3DN2 <- nls(density ~  Asym/(1 + exp(( xmid - log(conc) )/scal ) ),
              data = DN2, trace = TRUE,
              start = list( Asym = 3, xmid = 0, scal = 1 ))

## robust
Rm3DN2 <- nlrob(density ~  Asym/(1 + exp(( xmid - log(conc) )/scal ) ),
                data = DN2, trace = TRUE,
                start = list( Asym = 3, xmid = 0, scal = 1 ))
Rm3DN2
summary(Rm3DN2) # -> robustness weight of obs. 10 ~= 0.037
confint(Rm3DN2, method = "Wald")
stopifnot(identical(Rm3DN2$dataClasses,
                    c(density = "numeric", conc = "numeric")))

## utility function sfsmisc::lseq() :
lseq <- function (from, to, length)
  2^seq(log2(from), log2(to), length.out = length)
## predict() {and plot}:
h.x <- lseq(min(DN2$conc), max(DN2$conc), length = 100)
nDat <- data.frame(conc = h.x)

h.p  <- predict(fm3DN2, newdata = nDat)# classical
h.rp <- predict(Rm3DN2, newdata = nDat)# robust

plot(density ~ conc, data=DN2, log="x",
     main = format(formula(Rm3DN2)))
lines(h.x, h.p,  col="blue")
lines(h.x, h.rp, col="magenta")
legend("topleft", c("classical nls()", "robust nlrob()"),
       lwd = 1, col= c("blue", "magenta"), inset = 0.05)

## See  ?nlrob.algorithms for examples

DNase1 <- DNase[DNase$Run == 1,]
form <- density ~ Asym/(1 + exp(( xmid -log(conc) )/scal ))
gMM  <- nlrob(form, data = DNase1, method = "MM",
              lower = c(Asym = 0, xmid = 0, scal = 0),
              upper = 3, trace = TRUE)

## "CM" (and "mtl") additionally need bounds for "sigma" :
gCM  <- nlrob(form, data = DNase1, method = "CM",
              lower = c(Asym = 0, xmid = 0, scal = 0, sigma = 0),
              upper = c(3,3,3, sigma = 0.8))
summary(gCM)# did fail; note it has  NA NA NA (std.err, t val, P val)
stopifnot(identical(Rm3DN2$dataClasses, gMM$dataClasses),
          identical(   gCM$dataClasses, gMM$dataClasses))

[Package robustbase version 0.99-3 Index]