R: To estimate model parameters using Markov chain Monte Carlo...

paraest.MCMC {modelSSE}

R Documentation

To estimate model parameters using Markov chain Monte Carlo approach

Description

This function (i.e., paraest.MCMC()) performs model parameter estimation using random walk Markov chain Monte Carlo (MCMC) approach with given structured contact tracing data.

Usage

paraest.MCMC(
  can.epi.para.start = list(mean = 1, disp = 0.5, shift = 0.2),
  isfix.epi.para = list(mean = FALSE, disp = FALSE, shift = FALSE),
  offspring.type = "D",
  para.comb.num = 10000,
  burnin.frac = 0.33,
  data = NULL,
  var.name = list(obssize = NULL, seedsize = NULL, typelab = NULL),
  obs.type.lab = list(offspring = NULL, nextgen = NULL, outbreak = NULL)
)

Arguments

`can.epi.para.start`	A list (`list`) of the starting values for unknown epidemiological parameters for offspring distribution. The list should be in the format of `list(mean = ?, disp = ?, shift = ?)`. Each parameter must be a scalar, and only accept non-negative values. The default setting is given in the code Usage section. For Delaporte distribution, the value of `mean` should be larger than the value of `shift`.
`isfix.epi.para`	A list (`list`) of logistic values that indicate whether each parameter is fixed. By default, `isfix.epi.para = list(mean = FALSE, disp = FALSE, shift = FALSE)`, where all three parameters are not fixed and to be estimated.
`offspring.type`	A character label (`character`) indicating the type of distribution used to describe the offspring distribution. It only accepts one of the following values: `"D"` indicates the Delaporte distribution, `"NB"` indicates the negative binomial distribution, `"G"` indicates the geometric distribution, or `"P"` indicates the Poisson distribution. By default, `offspring.type = 'D'`.
`para.comb.num`	A positive integer for the number of iterations used for MCMC runs. By default, `para.comb.num = 10000`, and no need to change the default setting here unless for special reasons.
`burnin.frac`	A number with range strictly between 0 and 1 for the fraction of MCMC chain that will be discarded as the "burn-in" stage. By default, `burnin.frac = 0.33`, and no need to change the default setting here unless for special reasons.
`data`	A data frame (`data.frame`), or a vector (only when `obs.type.lab = "offspring"`) that contains the structured contact tracing data.
`var.name`	A list (`list`), or a character of variable name for the column names of dataset given in `data`. For a list of variable names, it should be in the format of `list(obssize = ?, seedsize = ?, typelab = ?)`. Please see the details section for more information. By default, `var.name = list(obssize = NULL, seedsize = NULL, typelab = NULL)`.
`obs.type.lab`	A list (`list`), or a character of labels (i.e., "offspring", "nextgen", or "outbreak") for the type of observations. For a list of labels, it should be in the format of `list(offspring = ?, nextgen = ?, outbreak = ?)`. Please see the details section for more information. By default, `obs.type.lab = list(offspring = NULL, nextgen = NULL, outbreak = NULL)`.

Details

For the values of parameters given in can.epi.para.start, they are some rough starting points for the MCMC to run, which are not necessarily to be precise (but have to be within a reasonable range), and they will used as a "start" status to find the posterior MCMC samples.

When obs.type.lab is a character, it should be either "offspring", "nextgen", or "outbreak" for type of observations. When obs.type.lab is a list, this occurs when the contact tracing data has more than one types of observations.

When the contact tracing dataset is offspring case observations, the function arguments data could be either a vector, or a data frame. If data is a vector, it is not necessary to assign any value to var.name. If data is a data frame, it is necessary to identify the variable name of offspring observations in var.name.

When the contact tracing dataset is next-generation cluster size, or final outbreak size observations, the variable names of both observations and seed case size should be identified in var.name with the format of list(obssize = ?, seedsize = ?).

When the contact tracing dataset has more than one types of observations, the variable names of observations, seed case size, and observation type should be identified in var.name with the format of list(obssize = ?, seedsize = ?, typelab = ?).

Value

A list (i.e., list) contains the following three items:

a data frame (data.frame) summaries the median, and 2.5% and 97.5% percentiles (95% credible interval, CrI) of the posterior MCMC samples for each unknown parameters,
the maximum log-likelihood value, and
a data frame (data.frame) of the posterior MCMC samples and their corresponding log-likelihood values.

Note

Each parameter in can.epi.para.start = list(mean = ?, disp = ?, shift = ?) should be a scalar, which means vector is not allowed here.

For the contact tracing data in data, unknown observations (i.e., NA) is not allowed.

As para.comb.num is the number of iterations for the MCMC runs, when para.comb.num is large, e.g., para.comb.num > 100000, the function paraest.MCMC() could take few seconds, or even minutes to complete, depending on the sample size, etc. Thus, we do not recommend the users to change the default setting of para.comb.num unless for special reasons.

References

Blumberg S, Funk S, Pulliam JR. Detecting differential transmissibilities that affect the size of self-limited outbreaks. PLoS Pathogens. 2014;10(10):e1004452. doi:10.1371/journal.ppat.1004452

Kucharski AJ, Althaus CL. The role of superspreading in Middle East respiratory syndrome coronavirus (MERS-CoV) transmission. Eurosurveillance. 2015;20(25):21167. doi:10.2807/1560-7917.ES2015.20.25.21167

Adam DC, Wu P, Wong JY, Lau EH, Tsang TK, Cauchemez S, Leung GM, Cowling BJ. Clustering and superspreading potential of SARS-CoV-2 infections in Hong Kong. Nature Medicine. 2020;26(11):1714-1719. doi:10.1038/s41591-020-1092-0

Zhao S, Chong MK, Ryu S, Guo Z, He M, Chen B, Musa SS, Wang J, Wu Y, He D, Wang MH. Characterizing superspreading potential of infectious disease: Decomposition of individual transmissibility. PLoS Computational Biology. 2022;18(6):e1010281. doi:10.1371/journal.pcbi.1010281

Examples



# example 1: for offspring observations #
## reproducing the parameter estimation results in Adam, et al. (2020)
## paper doi link: https://doi.org/10.1038/s41591-020-1092-0,
## (see the first row in Supplementary Table 4),
## where R of 0.58 (95% CI: 0.45, 0.72), and k of 0.43 (95% CI: 0.29, 0.67).
data(COVID19_JanApr2020_HongKong)
set.seed(2023)
MCMC.output = paraest.MCMC(
  #can.epi.para.start = list(mean = 0.60, disp = 0.50, shift = 0.20),
  offspring.type = "NB", para.comb.num = 10000,
  data = COVID19_JanApr2020_HongKong$obs,
  obs.type.lab = 'offspring'
)
print(MCMC.output$epi.para.est.output)
## Then, plot the posterior fitting results using 100 randomly-selected MCMC samples.
hist(
  COVID19_JanApr2020_HongKong$obs, breaks = c(0:100) -0.5, xlim = c(0,12),
  freq = FALSE, xlab = 'secondary cases', ylab = 'rel. freq.', main = ''
)
for(jj in 1:100){
  temp.random.col.index = sample.int(n = nrow(MCMC.output$est.record.mat), size = 1)
  est.record.array = MCMC.output$est.record.mat[temp.random.col.index,]
  lines(0:12, d_offspringdistn(
    x = 0:12,
    epi.para = list(
      mean = est.record.array$epi.para.mean,
      disp = est.record.array$epi.para.disp, shift = 0.1
    ),
    offspring.type = "NB"
  ), col = '#0066FF33', type = 'l', lty = 3)
}


# example 2: for offspring observations #
## reproducing the parameter estimation results in Zhao, et al. (2020)
## paper doi link: https://doi.org/10.1371/journal.pcbi.1010281,
## (see the results of dataset #3 using Delaporte distribution in Table 1), where
## R of 0.59 (95% CI: 0.46, 0.78),
## k of 0.16 (95% CI: 0.06, 0.40), and
## shift of 0.17 (95% CI: 0.04, 0.30).
data(COVID19_JanApr2020_HongKong)
set.seed(2023)
paraest.MCMC(
  can.epi.para.start = list(mean = 1, disp = 0.5, shift = 0.2),
  offspring.type = "D",
  data = COVID19_JanApr2020_HongKong$obs,
  obs.type.lab = 'offspring'
)$epi.para.est.output


# example 3: for next-generation cluster size observations #
## reproducing the parameter estimation results in Blumberg, et al, (2014)
## paper doi link: https://doi.org/10.1371/journal.ppat.1004452,
## (see the last row in Table 3, and Fig 4A),
## where R of 3.14 (95% CI: 2, >6), and k of 0.37 (95% CI: not reported).
data(smallpox_19581973_Europe)
set.seed(2023)
paraest.MCMC(
  can.epi.para.start = list(mean = 1, disp = 0.5, shift = 0.2),
  offspring.type = "NB",
  data = smallpox_19581973_Europe,
  var.name = list(obssize = 'obs.clustersize', seedsize = 'obs.seed'),
  obs.type.lab = 'nextgen'
)$epi.para.est.output


# example 4: final outbreak size observations #
## reproducing the parameter estimation results in Kucharski, Althaus. (2015)
## paper doi link: https://doi.org/10.2807/1560-7917.ES2015.20.25.21167,
## (see Fig 1, and Finding section),
## where R of 0.47 (95% CI: 0.29, 0.80), and k of 0.26 (95% CI: 0.09, 1.24).
data(MERS_2013_MEregion)
set.seed(2023)
paraest.MCMC(
  can.epi.para.start = list(mean = 1, disp = 0.5, shift = 0.2),
  offspring.type = "NB",
  data = MERS_2013_MEregion,
  var.name = list(obssize = 'obs.finalsize', seedsize = 'obs.seed'),
  obs.type.lab = 'outbreak'
)$epi.para.est.output


# example 5: for more than one types of observations #
## reproducing the parameter estimation results in Blumberg, et al, (2014)
## paper doi link: https://doi.org/10.1371/journal.ppat.1004452,
## (see the last row in Table 5, and Fig 6A),
## where R of 0.3 (95% CI: 0.2, 0.5), and k of 0.4 (95% CI: not reported).
data(mpox_19801984_DRC)
set.seed(2023)
paraest.MCMC(
  can.epi.para.start = list(mean = 1, disp = 0.5, shift = 0.2),
  offspring.type = "NB", para.comb.num = 30000,
  data = mpox_19801984_DRC,
  var.name = list(obssize = 'obs.size', seedsize = 'obs.seed', typelab = 'type'),
  obs.type.lab = list(offspring = 'offspring', nextgen = 'nextgen', outbreak = 'outbreak')
)$epi.para.est.output

[Package modelSSE version 0.1-3 Index]