egf {epigrowthfit}R Documentation

Fit Nonlinear Mixed Effects Models of Epidemic Growth

Description

Fits nonlinear mixed effects models of epidemic growth to collections of one or more disease incidence time series.

Usage

egf(model, ...)

## S3 method for class 'egf_model'
egf(model,
    formula_ts,
    formula_windows,
    formula_parameters = list(),
    formula_priors = list(),
    data_ts,
    data_windows,
    subset_ts = NULL,
    subset_windows = NULL,
    select_windows = NULL,
    na_action_ts = c("fail", "pass"),
    na_action_windows = c("fail", "omit"),
    control = egf_control(),
    init = list(),
    map = list(),
    fit = TRUE,
    se = FALSE,
    ...)

Arguments

model

an R object specifying a top level nonlinear model, typically of class egf_model.

formula_ts

a formula of the form cbind(time, x) ~ ts specifying one or more disease incidence time series in long format.

ts must evaluate to a factor (insofar as as.factor(ts) is a factor) grouping the data by time series. time must evaluate to a numeric vector that is increasing within levels of ts. Date, POSIXct, and POSIXlt vectors are supported and coerced to numeric with julian(time). Finally, x must evaluate to a non-negative numeric vector with x[i] equal to the number of cases observed over the interval (time[i-1], time[i]]. Edge cases like x[1] are ignored internally. Elements of x that are not integer-valued are rounded with a warning.

formula_ts = cbind(time, x) ~ 1 can be supplied when there is only one time series; it is equivalent to formula_ts = cbind(time, x) ~ ts with ts evaluating to rep(factor(1), length(x)).

formula_windows

a formula of the form cbind(start, end) ~ ts specifying disjoint fitting windows (start, end] in long format. If formula_ts = cbind(time, x) ~ ts1 and formula_windows = cbind(start, end) ~ ts2, then observation x[i] is associated with window (start[j], end[j]] if and only if time[i-1] >= start[j], time[i] <= end[j], and ts1[i] == ts2[j].

formula_parameters

a list of formulae of the form parameter ~ terms specifying mixed effects models for top level nonlinear model parameters using lme4-like syntax (see, e.g., help("lmer", package = "lme4")). Alternatively, a formula of the form ~terms to be recycled for all parameters.

A list of parameters for which formulae may be specified can be retrieved with egf_top. Specifically, deparse(parameter) must be an element of egf_top(model). The default for parameters not assigned a formula is ~1.

formula_priors

a list of formulae of the form parameter ~ prior defining priors on:
(i) top level nonlinear model parameters,
(ii) fixed effect coefficients and random effect covariance parameters (elements of segments beta and theta of the bottom level parameter vector), or
(iii) random effect covariance matrices (elements of a list Sigma containing the matrices).

prior must be a call to a prior function with arguments specifying suitable hyperparameters. In case (i), deparse(parameter) must be an element of egf_top(model), and hyperparameters supplied on the right hand side must have length 1. In cases (ii) and (iii), parameter must be beta, theta, or Sigma or a call to [ or [[ referring to a subset or element of beta, theta, or Sigma (e.g., beta[index], where index is a valid index vector for beta), and hyperparameters are recycled to the length of the indicated subset.

Expressions prior and index are evaluated in the corresponding formula environment.

data_ts, data_windows

data frames, lists, or environments to be searched for variables named in the corresponding formula_* and subset_* arguments. (formula_parameters uses data_windows.) Formula environments are searched for variables not found here.

subset_ts, subset_windows

expressions to be evaluated in the corresponding data_* data frames. The value should be a valid index vector for the rows of the data frame. Rows that are not indexed are discarded. Rows that are indexed are filtered further (e.g., time series with zero associated fitting windows are discarded regardless of subset_ts). The default is to preserve all rows for further filtering.

select_windows

an expression indicating additional variables in data_windows (if it is a data frame) to be preserved in the returned object for use by methods. The default is to preserve nothing. A dot ‘⁠.⁠’ is to preserve all variables not occurring in formula_windows or formula_parameters. Outside of these two special cases, evaluation of select follows the implementation of subset.data.frame.

na_action_ts

a character string affecting the handling of NA in x if formula_ts = cbind(time, x) ~ ts. "fail" is to throw an error. "pass" is to ignore NA when fitting and replace NA when predicting. NA in time and ts are always an error.

na_action_windows

a character string affecting the handling of NA in formula_windows and formula_parameters variables. "fail" is to throw an error. "omit" is to discard incomplete rows of data.

control

an egf_control object specifying control parameters.

init

a named list of numeric vectors with possible elements beta, theta, and b, specifying values to be used in the first likelihood evaluation for the so-named segments of the bottom level parameter vector. The default value of each segment is a zero vector, with the exception that "(Intercept)" coefficients in beta have default values computed from supplied time series. Use NA to indicate elements that should retain their default value.

map

a named list of factors with possible elements beta, theta, and b, each as long as the so-named segment of the bottom level parameter. Elements of a segment <name> indexed by is.na(map[["<name>"]]) are fixed at their initial values, rather than estimated, and elements corresponding to a common factor level are constrained to have a common value during estimation. map[["<name>"]] can be an index vector for segment <name>, instead of a factor. In this case, the indexed elements of that segment are fixed at their initial values.

fit

a logical. If FALSE, then egf returns early (before fitting) with a partial model object. The details of the partial result are subject to change and therefore sparsely documented, on purpose ...

se

a logical. If TRUE, then the Hessian matrix of the negative log marginal likelihood function is computed and inverted to approximate the joint covariance matrix of segments beta and theta of the bottom level parameter vector. Standard errors on the fitted values of all top level nonlinear model parameters are computed approximately using the delta method. Computations are preserved in the model object for reuse by methods.

...

additional arguments passed from or to other methods.

Details

Users attempting to set arguments formula_priors, init, and map should know the structure of the bottom level parameter vector. It is described under topic egf-class.

If

formula_ts = cbind(time, x) ~ ts1
formula_windows = cbind(start, end) ~ ts2

then it is expected that time, start, and end (after coercion to numeric) measure time on the same scale. To be precise, numeric times should have a common unit of measure and, at least within time series, represent displacements from a common reference time. These conditions will always hold if time, start, and end all evaluate to Date, POSIXct, or POSIXlt vectors.

When day of week effects are estimated, numeric times are interpreted as numbers of days since midnight on January 1, 1970, so that time points can be mapped unambiguously to days of week. Furthermore, in this case, time (after coercion to numeric) is required to be integer-valued with one day spacing in all time series. This means that

isTRUE(all.equal(time, round(time))) &&
    all(range(diff(round(time))) == 1)

must be TRUE in each level of ts1. These conditions ensure that intervals between successive time points represent exactly one day of week.

Value

A list inheriting from class egf. See topic egf-class for class documentation.

See Also

The many methods for class egf, listed by methods(class = "egf").

Examples

## Simulate 'N' incidence time series exhibiting exponential growth
set.seed(180149L)
N <- 10L
f <- function(time, r, c0) {
    lambda <- diff(exp(log(c0) + r * time))
    c(NA, rpois(lambda, lambda))
}
time <- seq.int(0, 40, 1)
r <- rlnorm(N, -3.2, 0.2)
c0 <- rlnorm(N, 6, 0.2)
data_ts <-
    data.frame(country = gl(N, length(time), labels = LETTERS[1:N]),
               time = rep.int(time, N),
               x = unlist(Map(f, time = list(time), r = r, c0 = c0)))
rm(f, time)

## Define fitting windows (here, two per time series)
data_windows <-
    data.frame(country = gl(N, 1L, 2L * N, labels = LETTERS[1:N]),
               wave = gl(2L, 10L),
               start = c(sample(seq.int(0, 5, 1), N, TRUE),
                         sample(seq.int(20, 25, 1), N, TRUE)),
               end = c(sample(seq.int(15, 20, 1), N, TRUE),
                       sample(seq.int(35, 40, 1), N, TRUE)))

## Estimate the generative model
m1 <-
    egf(model = egf_model(curve = "exponential", family = "pois"),
        formula_ts = cbind(time, x) ~ country,
        formula_windows = cbind(start, end) ~ country,
        formula_parameters = ~(1 | country:wave),
        data_ts = data_ts,
        data_windows = data_windows,
        se = TRUE)

## Re-estimate the generative model with:
## * Gaussian prior on beta[1L]
## * LKJ prior on all random effect covariance matrices
##   (here there happens to be just one)
## * initial value of 'theta' set explicitly
## * theta[3L] fixed at initial value
m2 <-
    update(m1,
           formula_priors = list(beta[1L] ~ Normal(mu = -3, sigma = 1),
                                 Sigma ~ LKJ(eta = 2)),
           init = list(theta = c(log(0.5), log(0.5), 0)),
           map = list(theta = 3L))

[Package epigrowthfit version 0.15.3 Index]