R: Simulate Survival Data

simdata {PWEXP}

R Documentation

Simulate Survival Data

Description

simdata is used to simulate a clinical trial data with time-to-event endpoints.

Usage

simdata(group="Group 1", strata="Strata 1", allocation=1,
    event_lambda=NA, drop_rate=NA, death_lambda=NA, n_rand=NULL,
    rand_rate=NULL, total_sample=NULL, add_column=c('followT'),
    simplify=TRUE, advanced_dist=NULL)

Arguments

`group`	a character vector of the names of each group (e.g., `c('treatment','control')`).
`strata`	a character vector of the names of strata in groups (e.g., `c('young','old')`).
`allocation`	the relative ratio of sample size in each subgroup (`group*strata`). See details. The value will be recycled if the length is less than needed.
`event_lambda`	the hazard rate of the primary endpoint (event). See details. The value will be recycled if the length is less than needed.
`drop_rate`	(optional) the drop-out rate (patients/month). Not hazard rate. See details. The value will be recycled if the length is less than needed.
`death_lambda`	(optional) the hazard rate of death. The value will be recycled if the length is less than needed.
`n_rand`	(required when `rand_rate=NULL`) a vector of the number of randomization each month; can be non-integers.
`rand_rate`	(required when `n_rand=NULL`) the randomization rate (patients/month; can be non-integer).
`total_sample`	(required when `n_rand=NULL`) total scheduled sample size.
`add_column`	request additional columns of the returned data frame. Valid options are: `'eventT_abs'`: absolute event time from the beginning of the trial (=eventT+randT) `'dropT_abs'`: absolute drop-out time from the beginning of the trial (=dropT+randT) `'deathT_abs'`: absolute death time from the beginning of the trial (=deathT+randT) `'censor'`: censoring (drop-out or death) indicator `'event'`: event indicator `'censor_reason'`: censoring reason ('drop_out','death','never_event'(eventT=inf)) `'followT'`: follow-up time (true observed time) from randT `'followT_abs'`: absolute follow-up time from the beginning of the trial (=followT+randT)
`simplify`	whether drop unused columns (e.g., the group variable when there is only one group). See details.
`advanced_dist`	use user-specified distributions for event, drop-out and death. A list containing random generation functions. See details and examples.

Details

See webpage https://zjph602xtc.github.io/PWEXP/ for a diagram illustration of the relationship between returned variables.

The total number of subgroups will be '# treatment groups' * '# strata'. The strata variable will be distributed into each treatment group. For example, if group = c('trt','placebo'), strata=c('A','B','C'), then there will be 6 subgroups: trt+A, trt+B, trt+C, placebo+A, placebo+B, placebo+C. The lengths of allocation, event_lambda, drop_rate, death_lambda should be 6 as well. Note that the values will be recycled for these variables. For example, if allocation=c(1,2,3), then the proportion of 6 subgroups is actually 1:2:3:1:2:3, which means 1:1 ratio for groups, 1:2:3 ratio in each stratum.

The event_lambda (\lambda) is the hazard rate of the interested events. The density function of events is f(t)=\lambda e^{-\lambda*t}. Similarly, the death_lambda is the hazard rate of death.

The drop_rate is the probability of drop-out at t=1, which means the hazard rate of drop-out is -log(1-drop_rate) (or say, drop_rate=1-e^{-hazard rate}.

When simplify=TRUE, these columns will NOT be included:

group when only one group is specified
strata when only one stratum is specified
eventT when event_lambda=NA
dropT when drop_rate=NA
deathT when death_lambda=NA

advanced_dist is used to define non-exponential distributions for event, drop-out or death. It is a list containing at least one of the elements: event_dist, drop_dist, death_dist. Each element has random generation functions for each subgroups. For example, advanced_dist=list(event_dist=c(function1, function2), drop_dist=c(function3, function4)). Here function1, function3 are the event, drop-out generation function for the first subgroup; function2, function4 for the second. If there is a third subgroup, function1, function3 will be reused. Each data generation function (functionX) is a function with only one input argument n (sample size). If any of the event_dist, drop_dist, death_dist is missing, then we search for event_lambda, drop_rate, death_lambda to generate a exp distribution; if they are also missing, then corresponding variable will not be generated .

Value

A data frame containing the some of these columns:

`ID`	subject ID
`group`	group indicator
`strata`	stratum indicator
`randT`	randomization time (from the beginning of the trial)
`eventT`	event time (from `randT`)
`eventT_abs`	event time (from the beginning of the trial)
`dropT`	drop-out time (from `randT`)
`dropT_abs`	drop-out time (from the beginning of the trial)
`deathT`	death time (from `randT`)
`deathT_abs`	death time (from the beginning of the trial)
`censor`	censoring (drop-out or death) indicator
`censor_reason`	censoring reason ('drop_out','death','never_event'(followT=inf))
`event`	event indicator
`followT`	follow-up time / observed time (from `randT`)
`followT_abs`	follow-up time / observed time (from the beginning of the trial)

Note

event_lambda, drop_rate, death_lambda can be 0, which means the corresponding subgroup will have an Inf value for each variable.

Author(s)

Tianchen Xu zjph602xutianchen@gmail.com

Examples

# Two groups with two strata. In the treatment group, there is a treatment
# sensitive stratum and a non-sensitive stratum. In the placebo group, all
# subjects are the same. Treatment:place=1:2. Drop rate=1% only in treatment group.
dat <- simdata(group=c('trt', 'place'), strata = c('sensitive','non-sensitive'),
               allocation = c(1,1,2,2), rand_rate = 20, total_sample = 1000,
               event_lambda = c(0.1, 0.2, 0.01, 0.01),
               drop_rate = c(0.01, 0.01, 0, 0))
# randomized subjects
table(dat$group,dat$strata)
# randomization curve
plot(sort(dat$randT), 1:1000, xlab='time', ylab='randomized subjects')
# event time in treatment group
plot(ecdf(dat$eventT[dat$group=='trt' & dat$strata=='sensitive']))
lines(ecdf(dat$eventT[dat$group=='trt' & dat$strata=='non-sensitive']), col='red')


# One group. Event follows a piecewise exponential distribution; drop-out follows
# a Weibull; death follows a exponential.
dist_trt <- function(n)rpwexp(n, rate=c(0.01, 0.05, 0.01), breakpoint = c(30,60))
dist_placebo <- function(n)rpwexp(n, rate=c(0.01, 0.005), breakpoint = c(50))
dat <- simdata(group = c('trt','placebo'), n_rand = c(rep(10,50),rep(20,10)),
               death_lambda = 0.01,
               advanced_dist = list(event_dist=c(dist_trt, dist_placebo),
                                    drop_dist=function(n)rweibull(n,3,40)))
# randomized subjects
table(dat$group)
# randomization curve
plot(sort(dat$randT), 1:700, xlab='time', ylab='randomized subjects')
# event time in both groups
plot(ecdf(dat$eventT[dat$group=='trt']), xlim=c(0,100))
lines(ecdf(dat$eventT[dat$group=='placebo']), col='red')
# drop-out time
plot(ecdf(dat$dropT), xlim=c(0,100))


# mixture cure distribution, 20% of the subject are cured and will not have events
dat <- simdata(strata=c('cure','non-cure'), allocation=c(20,80),
        event_lambda=c(0, 0.38), n_rand = rep(20,30),
        add_column = c('eventT_abs', 'censor', 'event',
                       'censor_reason', 'followT', 'followT_abs'))

[Package PWEXP version 0.5.0 Index]