sim_data_nll_basic {anovir}R Documentation

Function simulating survival data for nll_basic

Description

Function simulating survival data corresponding with assumptions of nll_basic

Usage

sim_data_nll_basic(
  a1 = a1,
  b1 = b1,
  n1 = n1,
  a2 = a2,
  b2 = b2,
  n2 = n2,
  t_censor = 1000,
  d1 = "",
  d2 = ""
)

Arguments

a1, b1, a2, b2

location and scale parameters describing background mortality and mortality due to infection, respectively; values must be > 0

n1, n2

size of populations to simulate data; uninfected and infected, respectively; values must be > 0

t_censor

time when any individuals still alive are right-censored; defaults to 1000 if not specified, must be > 0

d1, d2

probability distribution(s) chosen to describe background mortality and mortality due to infection, respectively; choice of, Weibull, Gumbel, Fréchet

Details

To generate data, the function collects the input variables for the probability distributions chosen to describe the background mortality and mortality due to infection, along with values for their location and scale parameters, respectively. These are used to parameterise cumulative survival functions for the two sources of mortality. These functions are solved for time t, where the value of S(t) is drawn from a random uniform distribution between 0 and 1; this yields values of t corresponding with values of survival drawn at random from the cumulative survival curve.

Values of t are rounded up to the nearest integer. This introduces a bias as recorded times of death are later than actual times of death; this bias occurs in real datasets when sampling occurs at discrete intervals. This bias can be partially offset by taking the mid-point of sampling intervals as the time of death. Mid-point times of death are also provided, assuming sampling occurs at each integer interval.

In the case of hosts from infected treatments, two potential times of death are calculated; that due to background mortality and that due to infection. The earlier of the two defines the time of death, providing it is not later than the value of t_censor.

The value of t_censor defines the time when any individuals remaining alive are right-censored. The act of censoring is assumed to occur after populations have been sampled for mortality, i.e., if mortality is recorded daily and the last day of an experiment is day 14, the populations are checked for mortality at the beginning of day 14 and any individuals alive after this are classed as censored on day 14. The timing of censoring is 'true' as individuals were known to be alive at the time they were censored. Hence times of censoring do not vary for 'time' or 'mid-time', whereas they do for individuals dying at an unknown time between two consecutive sampling times.

Warning

The lower tail of Gumbel distribution can yield negative estimates for times of death as the random variable replacing cumulative survival approaches one; S(t) -> 1.

Examples

  set.seed(34)

  sim_data <- sim_data_nll_basic(
    a1 = 2.5, b1 = 0.9, n1 = 100, d1 = "Weibull",
    a2 = 2.0, b2 = 0.5, n2 = 100, d2 = "Weibull",
    t_censor = 14)

  head(sim_data, 10)

  sim_data$time[sim_data$infected_treatment == 0]

  sim_data$time[sim_data$infected_treatment == 1]

  # plot histogram for ages at death in infected population
      hist(
        sim_data$time[(sim_data$infected_treatment == 1 & sim_data$censor == 0)],
        breaks = seq(0, 14, 1),
        xlim = c(0, 15),
        main = 'infected, died',
        xlab = 'time of death',
        xaxt = 'n')
      axis(side = 1, tick = TRUE, at = seq(0, 14, 1))

  # estimate location and scale parameters of simulated data with nll_basic
      m01_prep_function <- function(a1, b1, a2, b2){
        nll_basic(a1, b1, a2, b2,
          data = sim_data,
          time = time,
          censor = censor,
          infected_treatment = infected_treatment,
          d1 = 'Weibull', d2 = 'Weibull'
          )}

    m01 <- mle2(m01_prep_function,
             start = list(a1 = 2, b1 = 1, a2 = 2, b2 = 1)
             )

    confint(m01)

  # estimate using 'mid_time' instead of 'time'
      m02_prep_function <- function(a1, b1, a2, b2){
        nll_basic(a1, b1, a2, b2,
          data = sim_data,
          time = mid_time,
          censor = censor,
          infected_treatment = infected_treatment,
          d1 = 'Weibull', d2 = 'Weibull'
          )}

      m02 <- mle2(m02_prep_function,
             start = list(a1 = 2, b1 = 1, a2 = 2, b2 = 1)
             )

    confint(m02)

    # compare estimates
      AICc(m01, m02, nobs = 200)
      # for these data, m02 based on 'mid-time' provides a better
      # description of the data

[Package anovir version 0.1.0 Index]