R: To estimate model parameters using maximum likelihood...

paraest.ML {modelSSE}

R Documentation

To estimate model parameters using maximum likelihood approach

Description

This function (i.e., paraest.ML()) performs model parameter estimation using maximum likelihood (ML) approach with given structured contact tracing data.

Usage

paraest.ML(
  can.epi.para.range = list(mean = c(0.1, 2), disp = c(0.01, 2.5), shift = c(0.01, 0.5)),
  offspring.type = "D",
  para.comb.num = 1000,
  can.epi.para.set = NULL,
  data = NULL,
  var.name = list(obssize = NULL, seedsize = NULL, typelab = NULL),
  obs.type.lab = list(offspring = NULL, nextgen = NULL, outbreak = NULL)
)

Arguments

`can.epi.para.range`	A list (`list`) of ranges, or fixed values for unknown epidemiological parameters for offspring distribution. For the ranges of unknown epidemiological parameters, the list should be in the format of `list(mean = c(?, ?), disp = c(?, ?), shift = c(?, ?))`. For the fixed values of unknown epidemiological parameters, the list should be in the format of `list(mean = ?, disp = ?, shift = ?)`. Each parameter must be a scalar, and only accept non-negative values. The default setting is given in the code Usage section. For Delaporte distribution, the value of `mean` should be larger than the value of `shift`.
`offspring.type`	A character label (`character`) indicating the type of distribution used to describe the offspring distribution. It only accepts one of the following values: `"D"` indicates the Delaporte distribution, `"NB"` indicates the negative binomial distribution, `"G"` indicates the geometric distribution, or `"P"` indicates the Poisson distribution. By default, `offspring.type = 'D'`.
`para.comb.num`	A positive integer for the number of parameter combinations used to construct log-likelihood profile. By default, `para.comb.num = 1000`, and no need to change the default setting here unless for special reasons.
`can.epi.para.set`	A data frame (`data.frame`) of different parameter combinations. The data frame must have three variables with names `"epi.para.mean"`, `"epi.para.disp"`, and `"epi.para.shift"` for the three parameters. By default, `can.epi.para.set = NULL`. Note that the function argument `can.epi.para.set` is usually used internally, and thus no need to change the default setting here unless for special reasons
`data`	A data frame (`data.frame`), or a vector (only when `obs.type.lab = "offspring"`) that contains the structured contact tracing data.
`var.name`	A list (`list`), or a character of variable name for the column names of dataset given in `data`. For a list of variable names, it should be in the format of `list(obssize = ?, seedsize = ?, typelab = ?)`. Please see the details section for more information. By default, `var.name = list(obssize = NULL, seedsize = NULL, typelab = NULL)`.
`obs.type.lab`	A list (`list`), or a character of labels (i.e., "offspring", "nextgen", or "outbreak") for the type of observations. For a list of labels, it should be in the format of `list(offspring = ?, nextgen = ?, outbreak = ?)`. Please see the details section for more information. By default, `obs.type.lab = list(offspring = NULL, nextgen = NULL, outbreak = NULL)`.

Details

For the ranges of parameters given in can.epi.para.range, they are some rough ranges, which are not necessarily to be precise (but have to be within a reasonable range), and they will used as a "start" status to find the maximum likelihood estimate.

When obs.type.lab is a character, it should be either "offspring", "nextgen", or "outbreak" for type of observations. When obs.type.lab is a list, this occurs when the contact tracing data has more than one types of observations.

When the contact tracing dataset is offspring case observations, the function arguments data could be either a vector, or a data frame. If data is a vector, it is not necessary to assign any value to var.name. If data is a data frame, it is necessary to identify the variable name of offspring observations in var.name.

When the contact tracing dataset is next-generation cluster size, or final outbreak size observations, the variable names of both observations and seed case size should be identified in var.name with the format of list(obssize = ?, seedsize = ?).

When the contact tracing dataset has more than one types of observations, the variable names of observations, seed case size, and observation type should be identified in var.name with the format of list(obssize = ?, seedsize = ?, typelab = ?).

Value

A list (i.e., list) contains the following three items:

a data frame (data.frame) of the maximum likelihood estimate and 95% confidence interval (CI) of each unknown parameters,
the maximum log-likelihood value, and
a data frame (data.frame) of different parameter combinations and their corresponding log-likelihood values.

Note

For the contact tracing data in data, unknown observations (i.e., NA) is not allowed.

When para.comb.num is large, e.g., para.comb.num > 10000, the function paraest.ML() could take few seconds, or even minutes to complete, depending on the sample size, and model settings, etc. Thus, we do not recommend the users to change the default setting of para.comb.num unless for special reasons.

References

Blumberg S, Funk S, Pulliam JR. Detecting differential transmissibilities that affect the size of self-limited outbreaks. PLoS Pathogens. 2014;10(10):e1004452. doi:10.1371/journal.ppat.1004452

Kucharski AJ, Althaus CL. The role of superspreading in Middle East respiratory syndrome coronavirus (MERS-CoV) transmission. Eurosurveillance. 2015;20(25):21167. doi:10.2807/1560-7917.ES2015.20.25.21167

Adam DC, Wu P, Wong JY, Lau EH, Tsang TK, Cauchemez S, Leung GM, Cowling BJ. Clustering and superspreading potential of SARS-CoV-2 infections in Hong Kong. Nature Medicine. 2020;26(11):1714-1719. doi:10.1038/s41591-020-1092-0

Zhao S, Chong MK, Ryu S, Guo Z, He M, Chen B, Musa SS, Wang J, Wu Y, He D, Wang MH. Characterizing superspreading potential of infectious disease: Decomposition of individual transmissibility. PLoS Computational Biology. 2022;18(6):e1010281. doi:10.1371/journal.pcbi.1010281

Examples


## try to estimate the parameter (which is already known),
## using random samples generated from a geometric distribution with mean of 1.
set.seed(2020)
paraest.ML(
  can.epi.para.range = list(mean = c(0.1, 2.0), disp = c(0.01, 2.5), shift = c(0.01,0.5)),
  offspring.type = "NB", para.comb.num = 100,
  data = r_offspringdistn(
    n = 99, epi.para = list(mean = 1, disp = 0.5, shift = 0.2), offspring.type = "G"
  ),
  obs.type.lab = 'offspring'
)$epi.para.est.output




# example 1: for offspring observations #
## reproducing the parameter estimation results in Adam, et al. (2020)
## paper doi link: https://doi.org/10.1038/s41591-020-1092-0,
## (see the first row in Supplementary Table 4),
## where R of 0.58 (95% CI: 0.45, 0.72), and k of 0.43 (95% CI: 0.29, 0.67).
data(COVID19_JanApr2020_HongKong)
set.seed(2020)
paraest.ML(
  can.epi.para.range = list(mean = c(0.1, 2.0), disp = c(0.01, 2.5), shift = c(0.01,0.5)),
  offspring.type = "NB",
  data = COVID19_JanApr2020_HongKong$obs,
  obs.type.lab = 'offspring'
)$epi.para.est.output


# example 2: for offspring observations #
## reproducing the parameter estimation results in Zhao, et al. (2020)
## paper doi link: https://doi.org/10.1371/journal.pcbi.1010281,
## (see the results of dataset #3 using Delaporte distribution in Table 1), where
## R of 0.59 (95% CI: 0.46, 0.78),
## k of 0.16 (95% CI: 0.06, 0.40), and
## shift of 0.17 (95% CI: 0.04, 0.30).
data(COVID19_JanApr2020_HongKong)
set.seed(2020)
paraest.ML(
  can.epi.para.range = list(mean = c(0.1, 2.0), disp = c(0.01, 2.5), shift = c(0.01,0.5)),
  offspring.type = "D",
  data = COVID19_JanApr2020_HongKong$obs,
  obs.type.lab = 'offspring'
)$epi.para.est.output


# example 3: for next-generation cluster size observations #
## reproducing the parameter estimation results in Blumberg, et al, (2014)
## paper doi link: https://doi.org/10.1371/journal.ppat.1004452,
## (see the last row in Table 3, and Fig 4A),
## where R of 3.14 (95% CI: 2, >6), and k of 0.37 (95% CI: not reported).
data(smallpox_19581973_Europe)
set.seed(2020)
paraest.ML(
  can.epi.para.range = list(mean = c(0.1, 10.0), disp = c(0.01, 2.5), shift = c(0.01,0.5)),
  offspring.type = "NB",
  data = smallpox_19581973_Europe,
  var.name = list(obssize = 'obs.clustersize', seedsize = 'obs.seed'),
  obs.type.lab = 'nextgen'
)$epi.para.est.output


# example 4: final outbreak size observations #
## reproducing the parameter estimation results in Kucharski, Althaus. (2015)
## paper doi link: https://doi.org/10.2807/1560-7917.ES2015.20.25.21167,
## (see Fig 1, and Finding section),
## where R of 0.47 (95% CI: 0.29, 0.80), and k of 0.26 (95% CI: 0.09, 1.24).
data(MERS_2013_MEregion)
set.seed(2020)
paraest.ML(
  can.epi.para.range = list(mean = c(0.1, 2.0), disp = c(0.01, 2.5), shift = c(0.01,0.5)),
  offspring.type = "NB",
  data = MERS_2013_MEregion,
  var.name = list(obssize = 'obs.finalsize', seedsize = 'obs.seed'),
  obs.type.lab = 'outbreak'
)$epi.para.est.output


# example 5: for more than one types of observations #
## reproducing the parameter estimation results in Blumberg, et al, (2014)
## paper doi link: https://doi.org/10.1371/journal.ppat.1004452,
## (see the last row in Table 5, and Fig 6A),
## where R of 0.3 (95% CI: 0.2, 0.5), and k of 0.4 (95% CI: not reported).
data(mpox_19801984_DRC)
set.seed(2020)
paraest.ML(
  can.epi.para.range = list(mean = c(0.1, 2.0), disp = c(0.01, 2.5), shift = c(0.01,0.5)),
  offspring.type = "NB",
  data = mpox_19801984_DRC,
  var.name = list(obssize = 'obs.size', seedsize = 'obs.seed', typelab = 'type'),
  obs.type.lab = list(offspring = 'offspring', nextgen = 'nextgen', outbreak = 'outbreak')
)$epi.para.est.output

[Package modelSSE version 0.1-3 Index]