survsim-package {survsim}R Documentation

Simulation of simple and complex survival data

Description

Simulation of cohorts in a context of simple and complex survival analysis, multiple events and recurrent events including several covariates, individual heterogeneity and periods at risk before and after the initial time of follow-up.

Distribution Survival function Density function Parametrization
Weibull exp(- \lambda t^p) \lambda pt^{p-1}exp(- \lambda t^p) \lambda = exp(-p \beta_0)
Log-normal 1- \Theta((log(t)- \mu)/ \sigma) (1/(t \sigma \sqrt{2 \pi})) exp((-1/(2 \sigma^2))(log(t) - \mu)^2) \mu = \beta_0
Log-logistic 1/(1+(\lambda t)^{1/ \gamma}) \lambda^{1/ \gamma}t^{(1/ \gamma) - 1}/ (\gamma (1 + (\lambda t)^{1/ \gamma})^2) \lambda = exp(- \beta_0)
Distribution Time
Weibull t = (- ln u/ \lambda)^{1/p}
Log-normal t = exp(\beta_0 + \gamma (log(u)-log(1-u)))
Log-logistic t = exp(\beta_0 + \sigma \Theta^{-1}(u))

Where \Theta is the standard normal cumulative distribution.

In order to simulate censored survival data, two survival distributions are required, one for the uncensored survival times that would be observed if the follow-up had been sufficiently long to reach the event and another representing the censoring mechanism. The uncensored survival distribution, T'_i, for i=1,\ldots,n subjects, could be generated to depend on a set of covariates with a specified relationship with survival, which represents the true prognostic importance of each covariate (Burton, 2006). The package allows to simulate times by means of using Weibull (and exponential as a particular case), log-normal and log-logistic distributions, as such is showed in previous table. To induce individual heterogeneity or within-subject correlation we generate Z_i, a random effect covariate that follows a particular distribution (Uniform or Normal).

t_i = t_i'\cdot z_i

When z_i = 1, for all subjects, we are in the case of individual homogeneity and the survival times are completely specified by the covariates. Random non-informative right censoring, C_i, can be generated in a similar manner to the uncensored survival times, T'_i, by assuming a particular distribution for the censoring times (previous table), but without including any covariates nor individual heterogenity. The observation times, Y_i', incorporating both events and censored observations are calculated for each case by combining the uncensored survival times, T_i, and the censoring times, C_i. If the uncensored survival time for an observation is less than or equal to the censored time, then the event is considered to be observed and the observation time equals the uncensored survival time, otherwise the event is considered censored and the observation time equals the censored time. In other words, once simulated t_i and c_i, we can define Y_i'= min(t_i,c_i) as the obervation time with \delta_i an indicator of non-censoring, i.e. \delta_i = I(t_i \le c_i ). While all y_i' start at 0, the package allows create dynamic cohorts. We can generate entry times higher than 0 adding a t_0 value corresponding with an uniform distribution in [0,t_{max follow-up}]. We can also simulate subjects at risk before of the initial time of follow-up (y_i'= 0), by including an uniform distribution for t_0 between [-t_{max old},0) for a fixed percentage of subjects. Then:

y_i=y_i' + t_0

where t_0 follows a uniform distribution in [0,t_{max follow-up}] if entry time is 0 or more and t_0 is uniform distributed in [-t_{max old}, 0) if entry time is less than 0. Therefore, t_0 represents the initial point of the episode, y_i the endpoint and y_i' is the lenght. Note that y_i'+t_0 can be higher than t_{max follow-up}, and in this case y_i will be set at t_{max follow-up} and \delta_i=0. The observations corresponding to the subjects at risk before of the initial time of follow-up have t_0 negative, then the initial point of the episode will be set at 0. y_i may also be negative, in this case this episode will not be included in the simulated data, as long as this episode won't be observed in practice.

Details

Package: survsim
Type: Package
Version: 1.1.8
Date: 2021-12-13
License: GPL version 2 or newer
LazyLoad: yes

The package provide a tool for simulation of cohorts in a simple single-event context through the function simple.surv.sim, in a recurrent event context with the function rec.ev.sim, in a multiple event context with the function mult.ev.sim and in a competing risks context with the function crisk.sim, and it also allows the user to generate aggregated data from the simulated cohort, by means of the function accum.

Author(s)

David Moriña, (Universitat de Barcelona) and Albert Navarro (Universitat Autònoma de Barcelona)

Maintainer: David Moriña Soler <dmorina@ub.edu>

References

Kelly PJ, Lim LL. Survival analysis for recurrent event data: an application to childhood infectious diseases. Stat Med 2000 Jan 15;19(1):13-33.

Bender R, Augustin T, Blettner M. Generating survival times to simulate Cox proportional hazards models. Stat Med 2005 Jun 15;24(11):1713-1723.

Metcalfe C, Thompson SG. The importance of varying the event generation process in simulation studies of statistical methods for recurrent events. Stat Med 2006 Jan 15;25(1):165-179.

Burton A, Altman DG, Royston P, Holder RL. The design of simulation studies in medical statistics. Stat Med 2006 Dec 30;25(24):4279-4292.

Beyersmann J, Latouche A, Buchholz A, Schumacher M. Simulating competing risks data in survival analysis. Stat Med 2009 Jan 5;28(1):956-971.

Reis RJ, Utzet M, La Rocca PF, Nedel FB, Martin M, Navarro A. Previous sick leaves as predictor of subsequent ones. Int Arch Occup Environ Health 2011 Jun;84(5):491-499.

Navarro A, Moriña D, Reis R, Nedel FB, Martin M, Alvarado S. Hazard functions to describe patterns of new and recurrent sick leave episodes for different diagnoses. Scand J Work Environ Health 2012 Jan 27.

Moriña D, Navarro A. The R package survsim for the simulation of simple and complex survival data. Journal of Statistical Software 2014 Jul; 59(2):1-20.


[Package survsim version 1.1.8 Index]