R: Smooth Additive Regression for Discrete Data

addreg.smooth {addreg}

R Documentation

Smooth Additive Regression for Discrete Data

Description

addreg.smooth fits additive (identity-link) Poisson, negative binomial and binomial regression models using a stable EM algorithm. It provides additional flexibility over addreg by allowing for semi-parametric terms.

Usage

addreg.smooth(formula, mono = NULL, family, data, standard, subset, 
              na.action, offset, control = list(...), model = TRUE, 
              model.addreg = FALSE, method = c("cem", "em"), 
              accelerate = c("em", "squarem", "pem", "qn"),
              control.method = list(), ...)

Arguments

`formula`	an object of class `"formula"` (or one that can be coerced into that class): a symbolic description of the model to be fitted. The details of model specification are given under "Details". The model must contain an intercept and at least one semi-parametric term, included by using the `B` or `Iso` functions. Note that 2nd-order terms (such as interactions) or above are not currently supported (see `addreg`).
`mono`	a vector indicating which terms in `formula` should be restricted to have a monotonically non-decreasing relationship with the outcome. May be specified as names or indices of the terms. `Iso()` terms are always monotonic.
`family`	a description of the error distribution to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function (see `family` for details of family functions), but here it is restricted to be `poisson`, `negbin1` or `binomial` family with `identity` link.
`data`	an optional data frame, list or environment (or object coercible by `as.data.frame` to a data frame) containing the variables in the model. If not found in `data`, the variables are taken from `environment(formula)`, typically the environment from which `addreg.smooth` is called.
`standard`	a numeric vector of length equal to the number of cases, where each element is a positive constant that (multiplicatively) standardises the fitted value of the corresponding element of the response vector. Ignored for binomial family (the two-column specification of response should be used instead).
`subset`	an optional vector specifying a subset of observations to be used in the fitting process.
`na.action`	a function which indicates what should happen when the data contain `NA`s. The default is set be the `na.action` setting of `options`, and is `na.fail` if that is unset. The ‘factory-fresh’ default is `na.omit`. Another possible value is `NULL`, no action. Value `na.exclude` can be useful.
`offset`	this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be `NULL` or a non-negative numeric vector of length equal to the number of cases. One or more `offset` terms can be included in the formula instead or as well, and if more than one is specified their sum is used. See `model.offset`. Ignored for binomial family; not yet implemented for negative binomial models.
`control`	list of parameters for controlling the fitting process, passed to `addreg.control`.
`model`	a logical value indicating whether the model frame (and, for binomial models, the equivalent Poisson model) should be included as a component of the returned value.
`model.addreg`	a logical value indicating whether the fitted `addreg` object should be included as a component of the returned value.
`method`	a character string that determines which EM-type algorithm to use to find the MLE: `"cem"` for the combinatorial EM algorithm, which cycles through a sequence of constrained parameter spaces, or `"em"` for a single EM algorithm based on an overparameterised model.
`accelerate`	a character string that determines the acceleration algorithm to be used, (partially) matching one of `"em"` (no acceleration — the default), `"squarem"`, `"pem"` or `"qn"`. See `turboem` for further details. Note that `"decme"` is not permitted.
`control.method`	a list of control parameters for the acceleration algorithm, which are passed to the `control.method` argument of `turboem`. If any items are not specified, the defaults are used.
`...`	arguments to be used to form the default `control` argument if it is not supplied directly.

Details

addreg.smooth performs the same fitting process as addreg, providing a stable maximum likelihood estimation procedure for identity-link Poisson, negative binomial or binomial models, with the added flexibility of allowing semi-parametric B and Iso terms (note that addreg.smooth will stop with an error if no semi-parametric terms are specified in the right-hand side of the formula; addreg should be used instead).

The method partitions the parameter space associated with the semi-parametric part of the model into a sequence of constrained parameter spaces, and defines a fully parametric addreg model for each. The model with the highest log-likelihood is the MLE for the semi-parametric model (see Donoghoe and Marschner, 2015).

Acceleration of the EM algorithm can be achieved through the methods of the turboEM package, specified through the accelerate argument. However, note that these methods do not have the guaranteed convergence of the standard EM algorithm, particularly when the MLE is on the boundary of its (possibly constrained) parameter space.

Value

An object of class "addreg.smooth", which contains the same objects as class "addreg" (the same as "glm" objects, without contrasts, qr, R or effects components), as well as:

`model.addreg`	if `model.addreg` is `TRUE`; the `addreg` object for the fully parametric model corresponding to the fitted model.
`xminmax.smooth`	the minimum and maximum observed values for each of the smooth terms in the model, to help define the covariate space.
`full.formula`	the component from `interpret.addreg.smooth(formula)` that contains the `formula` term with any additional arguments to the `B` function removed.
`knots`	a named list containing the knot vectors for each of the smooth terms in the model.

Author(s)

Mark W. Donoghoe markdonoghoe@gmail.com

References

Donoghoe, M. W. and I. C. Marschner (2015). Flexible regression models for rate differences, risk differences and relative risks. International Journal of Biostatistics 11(1): 91–108.

Marschner, I. C. (2014). Combinatorial EM algorithms. Statistics and Computing 24(6): 921–940.

Examples

## Simple example
dat <- data.frame(x1 = c(3.2,3.3,3.4,7.9,3.8,0.7,2.0,5.4,8.4,3.0,1.8,5.6,5.5,9.0,8.2),
  x2 = c(1,0,0,1,0,1,0,0,0,0,1,0,1,1,0),
  n = c(6,7,5,9,10,7,9,6,6,7,7,8,6,8,10),
  y = c(2,1,2,6,3,1,2,2,4,4,1,2,5,7,7))
m1 <- addreg.smooth(cbind(y, n-y) ~ B(x1, knot.range = 1:3) + factor(x2), mono = 1,
  data = dat, family = binomial, trace = 1)

plot(m1, at = data.frame(x2 = 0:1))
points(dat$x1, dat$y / dat$n, col = rainbow(2)[dat$x2 + 1], pch = 20)

[Package addreg version 3.0 Index]