estimate_truncation {EpiNow2}R Documentation

Estimate Truncation of Observed Data

Description

[Stable] Estimates a truncation distribution from multiple snapshots of the same data source over time. This distribution can then be used in regional_epinow, epinow, and estimate_infections to adjust for truncated data. See here for an example of using this approach on Covid-19 data in England. The functionality offered by this function is now available in a more principled manner in the epinowcast R package.

The model of truncation is as follows:

  1. The truncation distribution is assumed to be discretised log normal wit a mean and standard deviation that is informed by the data.

  2. The data set with the latest observations is adjusted for truncation using the truncation distribution.

  3. Earlier data sets are recreated by applying the truncation distribution to the adjusted latest observations in the time period of the earlier data set. These data sets are then compared to the earlier observations assuming a negative binomial observation model with an additive noise term to deal with zero observations.

This model is then fit using stan with standard normal, or half normal, prior for the mean, standard deviation, 1 over the square root of the overdispersion and additive noise term.

This approach assumes that:

Usage

estimate_truncation(
  obs,
  max_truncation,
  trunc_max = 10,
  trunc_dist = "lognormal",
  truncation = dist_spec(mean = 0, sd = 0, mean_sd = 1, sd_sd = 1, max = 10),
  model = NULL,
  CrIs = c(0.2, 0.5, 0.9),
  weigh_delay_priors = FALSE,
  verbose = TRUE,
  ...
)

Arguments

obs

A list of data frames each containing a date variable and a confirm (integer) variable. Each data set should be a snapshot of the reported data over time. All data sets must contain a complete vector of dates.

max_truncation

Deprecated; use truncation instead.

trunc_max

Deprecated; use truncation instead.

trunc_dist

Deprecated; use truncation instead.

truncation

A call to trunc_opts() defining the truncation of observed data. Defaults to trunc_opts(). See estimate_truncation() for an approach to estimating truncation from data.

model

A compiled stan model to override the default model. May be useful for package developers or those developing extensions.

CrIs

Numeric vector of credible intervals to calculate.

weigh_delay_priors

Logical. If TRUE, all delay distribution priors will be weighted by the number of observation data points, in doing so approximately placing an independent prior at each time step and usually preventing the posteriors from shifting. If FALSE (default), no weight will be applied, i.e. delay distributions will be treated as a single parameters.

verbose

Logical, should model fitting progress be returned.

...

Additional parameters to pass to rstan::sampling.

Value

A list containing: the summary parameters of the truncation distribution (dist), the estimated CMF of the truncation distribution (cmf, can be used to adjusted new data), a data frame containing the observed truncated data, latest observed data and the adjusted for truncation observations (obs), a data frame containing the last observed data (last_obs, useful for plotting and validation), the data used for fitting (data) and the fit object (fit).

Author(s)

Sam Abbott

Sebastian Funk

Examples

# set number of cores to use
old_opts <- options()
options(mc.cores = ifelse(interactive(), 4, 1))

# get example case counts
reported_cases <- example_confirmed[1:60]

# define example truncation distribution (note not integer adjusted)
trunc <- dist_spec(
  mean = convert_to_logmean(3, 2),
  mean_sd = 0.1,
  sd = convert_to_logsd(3, 2),
  sd_sd = 0.1,
  max = 10
)

# apply truncation to example data
construct_truncation <- function(index, cases, dist) {
  set.seed(index)
  if (dist$dist == 0) {
    dfunc <- dlnorm
  } else {
    dfunc <- dgamma
  }
  cmf <- cumsum(
    dfunc(
      1:(dist$max + 1),
      rnorm(1, dist$mean_mean, dist$mean_sd),
      rnorm(1, dist$sd_mean, dist$sd_sd)
    )
  )
  cmf <- cmf / cmf[dist$max + 1]
  cmf <- rev(cmf)[-1]
  trunc_cases <- data.table::copy(cases)[1:(.N - index)]
  trunc_cases[
    (.N - length(cmf) + 1):.N, confirm := as.integer(confirm * cmf)
  ]
  return(trunc_cases)
}
example_data <- purrr::map(c(20, 15, 10, 0),
  construct_truncation,
  cases = reported_cases,
  dist = trunc
)

# fit model to example data
est <- estimate_truncation(example_data,
  verbose = interactive(),
  chains = 2, iter = 2000
)

# summary of the distribution
est$dist
# summary of the estimated truncation cmf (can be applied to new data)
print(est$cmf)
# observations linked to truncation adjusted estimates
print(est$obs)
# validation plot of observations vs estimates
plot(est)

options(old_opts)

[Package EpiNow2 version 1.4.0 Index]