R: GSL Nonlinear Least Squares fitting

gsl_nls {gslnls}

R Documentation

GSL Nonlinear Least Squares fitting

Description

Determine the nonlinear least-squares estimates of the parameters of a nonlinear model using the gsl_multifit_nlinear module present in the GNU Scientific Library (GSL).

Usage

gsl_nls(fn, ...)

## S3 method for class 'formula'
gsl_nls(
  fn,
  data = parent.frame(),
  start,
  algorithm = c("lm", "lmaccel", "dogleg", "ddogleg", "subspace2D"),
  control = gsl_nls_control(),
  lower,
  upper,
  jac = NULL,
  fvv = NULL,
  trace = FALSE,
  subset,
  weights,
  na.action,
  model = FALSE,
  ...
)

## S3 method for class 'function'
gsl_nls(
  fn,
  y,
  start,
  algorithm = c("lm", "lmaccel", "dogleg", "ddogleg", "subspace2D"),
  control = gsl_nls_control(),
  lower,
  upper,
  jac = NULL,
  fvv = NULL,
  trace = FALSE,
  weights,
  ...
)

Arguments

`fn`	a nonlinear model defined either as a two-sided formula including variables and parameters, or as a function returning a numeric vector, with first argument the vector of parameters to be estimated. See the individual method descriptions below.
`data`	an optional data frame in which to evaluate the variables in `fn` if defined as a formula. Can also be a list or an environment, but not a matrix.
`y`	numeric response vector if `fn` is defined as a function, equal in length to the vector returned by evaluation of the function `fn`.
`start`	a vector, list or matrix of initial parameter values or parameter ranges. `start` is only allowed to be missing if `fn` is a `selfStart` model. The following choices are supported: a named list or named vector of numeric starting values. If `start` has no missing values, a standard single-start optimization is performed. If `start` contains missing values for one or more parameters, a multi-start algorithm (see ‘Details’) with dynamic starting ranges for the undefined parameters and fixed starting values for the remaining parameters is executed. If `start` is a named list or vector containing only missing values, the multi-start algorithm considers dynamically changing starting ranges for all parameters. Note that there is no guarantee that the optimizing solution is a global minimum of the least-squares objective. a named list with starting parameter ranges in the form of length-2 numeric vectors. Can also be a (`2` by `p`) named matrix with as columns the numeric starting ranges for the parameters. If `start` contains no missing values, a multi-start algorithm with fixed starting ranges for the parameters is executed. Otherwise, if `start` contains infinities or missing values (e.g. `c(0, Inf)` or `c(NA, NA)`), the multi-start algorithm considers dynamically changing starting ranges for the parameters with infinite and/or missing ranges.
`algorithm`	character string specifying the algorithm to use. The following choices are supported: `"lm"` Levenberg-Marquardt algorithm (default). `"lmaccel"` Levenberg-Marquardt algorithm with geodesic acceleration. Stability is controlled by the `avmax` parameter in `control`, setting `avmax` to zero is analogous to not using geodesic acceleration. `"dogleg"` Powell's dogleg algorithm. `"ddogleg"` Double dogleg algorithm, an improvement over `"dogleg"` by including information about the Gauss-Newton step while the iteration is still far from the minimum. `"subspace2D"` 2D generalization of the dogleg algorithm. This method searches a larger subspace for a solution, it can converge more quickly than `"dogleg"` on some problems.
`control`	an optional list of control parameters to tune the least squares iterations and multistart algorithm. See `gsl_nls_control` for the available control parameters and their default values.
`lower`	a named list or named numeric vector of parameter lower bounds. If missing (default), the parameters are unconstrained from below.
`upper`	a named list or named numeric vector of parameter upper bounds. If missing (default), the parameters are unconstrained from above.
`jac`	either `NULL` (default) or a function returning the `n` by `p` dimensional Jacobian matrix of the nonlinear model `fn`, where `n` is the number of observations and `p` the number of parameters. If a function, the first argument must be the vector of parameters of length `p`. If `NULL`, the Jacobian is computed internally using a finite difference approximations. Can also be `TRUE`, in which case `jac` is derived symbolically with `deriv`, this only works if `fn` is defined as a (non-selfstarting) formula. If `fn` is a `selfStart` model, the Jacobian specified in the `"gradient"` attribute of the self-start model is used instead.
`fvv`	either `NULL` (default) or a function returning an `n` dimensional vector containing the second directional derivatives of the nonlinear model `fn`, with `n` the number of observations. This argument is only used if geodesic acceleration is enabled (`algorithm = "lmaccel"`). If a function, the first argument must be the vector of parameters of length `p` and the second argument must be the velocity vector also of length `p`. If `NULL`, the second directional derivative vector is computed internal using a finite difference approximation. Can also be `TRUE`, in which case `fvv` is derived symbolically with `deriv`, this only works if `fn` is defined as a (non-selfstarting) formula. If the model function in `fn` also returns a `"hessian"` attribute (similar to the `"gradient"` attribute in a `selfStart` model), this Hessian matrix is used to evaluate the second directional derivatives instead.
`trace`	logical value indicating if a trace of the iteration progress should be printed. Default is `FALSE`. If `TRUE`, the residual (weighted) sum-of-squares and the current parameter estimates are printed after each iteration.
`subset`	an optional vector specifying a subset of observations to be used in the fitting process. This argument is only used if `fn` is defined as a formula.
`weights`	an optional numeric vector of (fixed) weights. When present, the objective function is weighted least squares.
`na.action`	a function which indicates what should happen when the data contain `NA`s. The default is set by the `na.action` setting of `options`, and is `na.fail` if that is unset. The 'factory-fresh' default is `na.omit`. Value `na.exclude` can be useful. This argument is only used if `fn` is defined as a formula.
`model`	a logical value. If `TRUE`, the model frame is returned as part of the object. Defaults to `FALSE`. This argument is only used if `fn` is defined as a formula.
`...`	additional arguments passed to the calls of `fn`, `jac` and `fvv` if defined as functions.

Value

If fn is a formula returns a list object of class nls. If fn is a function returns a list object of class gsl_nls. See the individual method descriptions for the structures of the returned lists and the generic functions applicable to objects of both classes.

Methods (by class)

gsl_nls(formula): If fn is a formula, the returned list object is of classes gsl_nls and nls. Therefore, all generic functions applicable to objects of class nls, such as anova, coef, confint, deviance, df.residual, fitted, formula, logLik, nobs, predict, print, profile, residuals, summary, vcov and weights are also applicable to the returned list object. In addition, a method confintd is available for inference of derived parameters.
gsl_nls(function): If fn is a function, the first argument must be the vector of parameters and the function should return a numeric vector containing the nonlinear model evaluations at the provided parameter and predictor or covariate vectors. In addition, the argument y needs to contain the numeric vector of observed responses, equal in length to the numeric vector returned by fn. The returned list object is (only) of class gsl_nls. Although the returned object is not of class nls, the following generic functions remain applicable for an object of class gsl_nls: anova, coef, confint, deviance, df.residual, fitted, formula, logLik, nobs, predict, print, residuals, summary, vcov and weights. In addition, a method confintd is available for inference of derived parameters.

Multi-start algorithm

If start is a list or matrix of parameter ranges, or contains any missing values, a modified version of the multi-start algorithm described in Hickernell and Yuan (1997) is applied. Note that the start parameter ranges are only used to bound the domain for the starting values, i.e. the resulting parameter estimates are not constrained to lie within these bounds, use lower and/or upper for this purpose instead. Quasi-random starting values are sampled in the unit hypercube from a Sobol sequence if p < 41 and from a Halton sequence (up to p = 1229) otherwise. The initial starting values are scaled to the specified parameter ranges using an inverse (scaled) logistic function favoring starting values near the center of the (scaled) domain. The trust region algorithm as specified by algorithm used for the inexpensive and expensive local search (see Algorithm 2.1 of Hickernell and Yuan (1997)) are the same, only differing in the number of search iterations mstart_p versus mstart_maxiter, where mstart_p is typically much smaller than mstart_maxiter. When a new stationary point is detected, the scaling step from the unit hypercube to the starting value domain is updated using the diagonal of the estimated trust method's scaling matrix D, which improves optimization performance especially when the parameters live on very different scales. The multi-start algorithm terminates when NSP (number of stationary points) is larger than or equal to mstart_minsp and NWSP (number of worse stationary points) is larger than or equal to mstart_r times NSP, or when the maximum number of major iterations mstart_maxstart is reached. After termination of the multi-start algorithm, a full single-start optimization is executed starting from the best multi-start solution.

Missing starting values

If start contains missing (or infinite) values, the multi-start algorithm is executed without fixed parameter ranges for the missing parameters. The ranges for the missing parameters are initialized to the unit interval and dynamically increased or decreased in each major iteration of the multi-start algorithm. The decision to increase or decrease a parameter range is driven by the minimum and maximum parameter values attained by the first mstart_q inexpensive local searches ordered by their squared loss, which typically provide a decent indication of the order of magnitude of the parameter range in which to search for the optimal solution. Note that this procedure is not expected to always return a global minimum of the nonlinear least-squares objective. Especially when the objective function contains many local optima, the algorithm may be unable to select parameter ranges that include the global minimizing solution. In this case, it may help to increase the values of mstart_n, mstart_r or mstart_minsp to avoid early termination of the algorithm at the cost of increased computational effort.

References

M. Galassi et al., GNU Scientific Library Reference Manual (3rd Ed.), ISBN 0954612078.

Hickernell, F.J. and Yuan, Y. (1997) “A simple multistart algorithm for global optimization”, OR Transactions, Vol. 1 (2).

Examples

# Example 1: exponential model
# (https://www.gnu.org/software/gsl/doc/html/nls.html#exponential-fitting-example)

## data
set.seed(1)
n <- 25
x <- (seq_len(n) - 1) * 3 / (n - 1)
f <- function(A, lam, b, x) A * exp(-lam * x) + b
y <- f(A = 5, lam = 1.5, b = 1, x) + rnorm(n, sd = 0.25)

## model fit
ex1_fit <- gsl_nls(
  fn = y ~ A * exp(-lam * x) + b,                        ## model formula
  data = data.frame(x = x, y = y),                       ## model fit data
  start = c(A = 0, lam = 0, b = 0)                       ## starting values
)
summary(ex1_fit)                                         ## model summary
predict(ex1_fit, interval = "prediction")                ## prediction intervals

## multi-start
gsl_nls(
  fn = y ~ A * exp(-lam * x) + b,                             ## model formula
  data = data.frame(x = x, y = y),                            ## model fit data
  start = list(A = c(0, 100), lam = c(0, 10), b = c(-10, 10)) ## starting ranges
)
## missing starting values
gsl_nls(
  fn = y ~ A * exp(-lam * x) + b,                        ## model formula
  data = data.frame(x = x, y = y),                       ## model fit data
  start = c(A = NA, lam = NA, b = NA)                    ## unknown start
)

## analytic Jacobian 1
gsl_nls(
  fn = y ~ A * exp(-lam * x) + b,                        ## model formula
  data = data.frame(x = x, y = y),                       ## model fit data
  start = c(A = 0, lam = 0, b = 0),                      ## starting values
  jac = function(par) with(as.list(par),                 ## jacobian
    cbind(A = exp(-lam * x), lam = -A * x * exp(-lam * x), b = 1)
  )
)

## analytic Jacobian 2
gsl_nls(
  fn = y ~ A * exp(-lam * x) + b,                        ## model formula
  data = data.frame(x = x, y = y),                       ## model fit data
  start = c(A = 0, lam = 0, b = 0),                      ## starting values
  jac = TRUE                                             ## automatic derivation
)

## self-starting model
gsl_nls(
  fn =  y ~ SSasymp(x, Asym, R0, lrc),                   ## model formula
  data = data.frame(x = x, y = y)                        ## model fit data
)

# Example 2: Gaussian function
# (https://www.gnu.org/software/gsl/doc/html/nls.html#geodesic-acceleration-example-2)

## data
set.seed(1)
n <- 100
x <- seq_len(n) / n
f <- function(a, b, c, x) a * exp(-(x - b)^2 / (2 * c^2))
y <- f(a = 5, b = 0.4, c = 0.15, x) * rnorm(n, mean = 1, sd = 0.1)

## Levenberg-Marquardt (default)
gsl_nls(
  fn = y ~ a * exp(-(x - b)^2 / (2 * c^2)),             ## model formula
  data = data.frame(x = x, y = y),                      ## model fit data
  start = c(a = 1, b = 0, c = 1),                       ## starting values
  trace = TRUE                                          ## verbose output
)

## Levenberg-Marquardt w/ geodesic acceleration 1
gsl_nls(
  fn = y ~ a * exp(-(x - b)^2 / (2 * c^2)),             ## model formula
  data = data.frame(x = x, y = y),                      ## model fit data
  start = c(a = 1, b = 0, c = 1),                       ## starting values
  algorithm = "lmaccel",                                ## algorithm
  trace = TRUE                                          ## verbose output
)

## Levenberg-Marquardt w/ geodesic acceleration 2
## second directional derivative
fvv <- function(par, v, x) {
  with(as.list(par), {
    zi <- (x - b) / c
    ei <- exp(-zi^2 / 2)
    2 * v[["a"]] * v[["b"]] * zi / c * ei + 2 * v[["a"]] * v[["c"]] * zi^2 / c * ei -
      v[["b"]]^2 * a / c^2 * (1 - zi^2) * ei -
      2 * v[["b"]] * v[["c"]] * a / c^2 * zi * (2 - zi^2) * ei -
      v[["c"]]^2 * a / c^2 * zi^2 * (3 - zi^2) * ei
  })
}

## analytic fvv 1
gsl_nls(
  fn = y ~ a * exp(-(x - b)^2 / (2 * c^2)),             ## model formula
  data = data.frame(x = x, y = y),                      ## model fit data
  start = c(a = 1, b = 0, c = 1),                       ## starting values
  algorithm = "lmaccel",                                ## algorithm
  trace = TRUE,                                         ## verbose output
  fvv = fvv,                                            ## analytic fvv
  x = x                                                 ## argument passed to fvv
)

## analytic fvv 2
gsl_nls(
  fn = y ~ a * exp(-(x - b)^2 / (2 * c^2)),             ## model formula
  data = data.frame(x = x, y = y),                      ## model fit data
  start = c(a = 1, b = 0, c = 1),                       ## starting values
  algorithm = "lmaccel",                                ## algorithm
  trace = TRUE,                                         ## verbose output
  fvv = TRUE                                            ## automatic derivation
)

# Example 3: Branin function
# (https://www.gnu.org/software/gsl/doc/html/nls.html#comparing-trs-methods-example)

## Branin model function
branin <- function(x) {
  a <- c(-5.1 / (4 * pi^2), 5 / pi, -6, 10, 1 / (8 * pi))
  f1 <- x[2] + a[1] * x[1]^2 + a[2] * x[1] + a[3]
  f2 <- sqrt(a[4] * (1 + (1 - a[5]) * cos(x[1])))
  c(f1, f2)
}

## Dogleg minimization w/ model as function
gsl_nls(
  fn = branin,                   ## model function
  y = c(0, 0),                   ## response vector
  start = c(x1 = 6, x2 = 14.5),  ## starting values
  algorithm = "dogleg"           ## algorithm
)

# Available example problems
nls_test_list()

## BOD regression
## (https://www.itl.nist.gov/div898/strd/nls/nls_main.shtml)
(boxbod <- nls_test_problem(name = "BoxBOD"))
with(boxbod,
     gsl_nls(
       fn = fn,
       data = data,
       start = list(b1 = NA, b2 = NA)
     )
)

## Rosenbrock function
(rosenbrock <- nls_test_problem(name = "Rosenbrock"))
with(rosenbrock,
     gsl_nls(
       fn = fn,
       y = y,
       start = c(x1 = NA, x2 = NA),
       jac = jac
     )
)

[Package gslnls version 1.3.2 Index]