R: Fit Nonlinear Mixed Effects Models of Epidemic Growth

egf {epigrowthfit}

R Documentation

Fit Nonlinear Mixed Effects Models of Epidemic Growth

Description

Fits nonlinear mixed effects models of epidemic growth to collections of one or more disease incidence time series.

Usage

egf(model, ...)

## S3 method for class 'egf_model'
egf(model,
    formula_ts,
    formula_windows,
    formula_parameters = list(),
    formula_priors = list(),
    data_ts,
    data_windows,
    subset_ts = NULL,
    subset_windows = NULL,
    select_windows = NULL,
    na_action_ts = c("fail", "pass"),
    na_action_windows = c("fail", "omit"),
    control = egf_control(),
    init = list(),
    map = list(),
    fit = TRUE,
    se = FALSE,
    ...)

Arguments

`model`	an R object specifying a top level nonlinear model, typically of class `egf_model`.
`formula_ts`	a formula of the form `cbind(time, x) ~ ts` specifying one or more disease incidence time series in long format. `ts` must evaluate to a factor (insofar as `as.factor(ts)` is a factor) grouping the data by time series. `time` must evaluate to a numeric vector that is increasing within levels of `ts`. `Date`, `POSIXct`, and `POSIXlt` vectors are supported and coerced to numeric with `julian(time)`. Finally, `x` must evaluate to a non-negative numeric vector with `x[i]` equal to the number of cases observed over the interval `(time[i-1], time[i]]`. Edge cases like `x[1]` are ignored internally. Elements of `x` that are not integer-valued are rounded with a warning. `formula_ts = cbind(time, x) ~ 1` can be supplied when there is only one time series; it is equivalent to `formula_ts = cbind(time, x) ~ ts` with `ts` evaluating to `rep(factor(1), length(x))`.
`formula_windows`	a formula of the form `cbind(start, end) ~ ts` specifying disjoint fitting windows `(start, end]` in long format. If `formula_ts = cbind(time, x) ~ ts1` and `formula_windows = cbind(start, end) ~ ts2`, then observation `x[i]` is associated with window `(start[j], end[j]]` if and only if `time[i-1] >= start[j]`, `time[i] <= end[j]`, and `ts1[i] == ts2[j]`.
`formula_parameters`	a list of formulae of the form `parameter ~ terms` specifying mixed effects models for top level nonlinear model parameters using lme4-like syntax (see, e.g., `help("lmer", package = "lme4")`). Alternatively, a formula of the form `~terms` to be recycled for all parameters. A list of parameters for which formulae may be specified can be retrieved with `egf_top`. Specifically, `deparse(parameter)` must be an element of `egf_top(model)`. The default for parameters not assigned a formula is `~1`.
`formula_priors`	a list of formulae of the form `parameter ~ prior` defining priors on: (i) top level nonlinear model parameters, (ii) fixed effect coefficients and random effect covariance parameters (elements of segments `beta` and `theta` of the bottom level parameter vector), or (iii) random effect covariance matrices (elements of a list `Sigma` containing the matrices). `prior` must be a call to a prior function with arguments specifying suitable hyperparameters. In case (i), `deparse(parameter)` must be an element of `egf_top(model)`, and hyperparameters supplied on the right hand side must have length 1. In cases (ii) and (iii), `parameter` must be `beta`, `theta`, or `Sigma` or a call to `[` or `[[` referring to a subset or element of `beta`, `theta`, or `Sigma` (e.g., `beta[index]`, where `index` is a valid index vector for `beta`), and hyperparameters are recycled to the length of the indicated subset. Expressions `prior` and `index` are evaluated in the corresponding formula environment.
`data_ts`, `data_windows`	data frames, lists, or environments to be searched for variables named in the corresponding `formula_` and `subset_` arguments. (`formula_parameters` uses `data_windows`.) Formula environments are searched for variables not found here.
`subset_ts`, `subset_windows`	expressions to be evaluated in the corresponding `data_*` data frames. The value should be a valid index vector for the rows of the data frame. Rows that are not indexed are discarded. Rows that are indexed are filtered further (e.g., time series with zero associated fitting windows are discarded regardless of `subset_ts`). The default is to preserve all rows for further filtering.
`select_windows`	an expression indicating additional variables in `data_windows` (if it is a data frame) to be preserved in the returned object for use by methods. The default is to preserve nothing. A dot ‘⁠.⁠’ is to preserve all variables not occurring in `formula_windows` or `formula_parameters`. Outside of these two special cases, evaluation of `select` follows the implementation of `subset.data.frame`.
`na_action_ts`	a character string affecting the handling of `NA` in `x` if `formula_ts = cbind(time, x) ~ ts`. `"fail"` is to throw an error. `"pass"` is to ignore `NA` when fitting and replace `NA` when predicting. `NA` in `time` and `ts` are always an error.
`na_action_windows`	a character string affecting the handling of `NA` in `formula_windows` and `formula_parameters` variables. `"fail"` is to throw an error. `"omit"` is to discard incomplete rows of data.
`control`	an `egf_control` object specifying control parameters.
`init`	a named list of numeric vectors with possible elements `beta`, `theta`, and `b`, specifying values to be used in the first likelihood evaluation for the so-named segments of the bottom level parameter vector. The default value of each segment is a zero vector, with the exception that `"(Intercept)"` coefficients in `beta` have default values computed from supplied time series. Use `NA` to indicate elements that should retain their default value.
`map`	a named list of factors with possible elements `beta`, `theta`, and `b`, each as long as the so-named segment of the bottom level parameter. Elements of a segment `<name>` indexed by `is.na(map[["<name>"]])` are fixed at their initial values, rather than estimated, and elements corresponding to a common factor level are constrained to have a common value during estimation. `map[["<name>"]]` can be an index vector for segment `<name>`, instead of a factor. In this case, the indexed elements of that segment are fixed at their initial values.
`fit`	a logical. If `FALSE`, then `egf` returns early (before fitting) with a partial model object. The details of the partial result are subject to change and therefore sparsely documented, on purpose ...
`se`	a logical. If `TRUE`, then the Hessian matrix of the negative log marginal likelihood function is computed and inverted to approximate the joint covariance matrix of segments `beta` and `theta` of the bottom level parameter vector. Standard errors on the fitted values of all top level nonlinear model parameters are computed approximately using the delta method. Computations are preserved in the model object for reuse by methods.
`...`	additional arguments passed from or to other methods.

Details

Users attempting to set arguments formula_priors, init, and map should know the structure of the bottom level parameter vector. It is described under topic egf-class.

formula_ts = cbind(time, x) ~ ts1
formula_windows = cbind(start, end) ~ ts2

then it is expected that time, start, and end (after coercion to numeric) measure time on the same scale. To be precise, numeric times should have a common unit of measure and, at least within time series, represent displacements from a common reference time. These conditions will always hold if time, start, and end all evaluate to Date, POSIXct, or POSIXlt vectors.

When day of week effects are estimated, numeric times are interpreted as numbers of days since midnight on January 1, 1970, so that time points can be mapped unambiguously to days of week. Furthermore, in this case, time (after coercion to numeric) is required to be integer-valued with one day spacing in all time series. This means that

isTRUE(all.equal(time, round(time))) &&
    all(range(diff(round(time))) == 1)

must be TRUE in each level of ts1. These conditions ensure that intervals between successive time points represent exactly one day of week.

Value

A list inheriting from class egf. See topic egf-class for class documentation.

Examples

## Simulate 'N' incidence time series exhibiting exponential growth
set.seed(180149L)
N <- 10L
f <- function(time, r, c0) {
    lambda <- diff(exp(log(c0) + r * time))
    c(NA, rpois(lambda, lambda))
}
time <- seq.int(0, 40, 1)
r <- rlnorm(N, -3.2, 0.2)
c0 <- rlnorm(N, 6, 0.2)
data_ts <-
    data.frame(country = gl(N, length(time), labels = LETTERS[1:N]),
               time = rep.int(time, N),
               x = unlist(Map(f, time = list(time), r = r, c0 = c0)))
rm(f, time)

## Define fitting windows (here, two per time series)
data_windows <-
    data.frame(country = gl(N, 1L, 2L * N, labels = LETTERS[1:N]),
               wave = gl(2L, 10L),
               start = c(sample(seq.int(0, 5, 1), N, TRUE),
                         sample(seq.int(20, 25, 1), N, TRUE)),
               end = c(sample(seq.int(15, 20, 1), N, TRUE),
                       sample(seq.int(35, 40, 1), N, TRUE)))

## Estimate the generative model
m1 <-
    egf(model = egf_model(curve = "exponential", family = "pois"),
        formula_ts = cbind(time, x) ~ country,
        formula_windows = cbind(start, end) ~ country,
        formula_parameters = ~(1 | country:wave),
        data_ts = data_ts,
        data_windows = data_windows,
        se = TRUE)

## Re-estimate the generative model with:
## * Gaussian prior on beta[1L]
## * LKJ prior on all random effect covariance matrices
##   (here there happens to be just one)
## * initial value of 'theta' set explicitly
## * theta[3L] fixed at initial value
m2 <-
    update(m1,
           formula_priors = list(beta[1L] ~ Normal(mu = -3, sigma = 1),
                                 Sigma ~ LKJ(eta = 2)),
           init = list(theta = c(log(0.5), log(0.5), 0)),
           map = list(theta = 3L))

[Package epigrowthfit version 0.15.3 Index]