simdata {PWEXP}R Documentation

Simulate Survival Data

Description

simdata is used to simulate a clinical trial data with time-to-event endpoints.

Usage

simdata(group="Group 1", strata="Strata 1", allocation=1,
    event_lambda=NA, drop_rate=NA, death_lambda=NA, n_rand=NULL,
    rand_rate=NULL, total_sample=NULL, add_column=c('followT'),
    simplify=TRUE, advanced_dist=NULL)

Arguments

group

a character vector of the names of each group (e.g., c('treatment','control')).

strata

a character vector of the names of strata in groups (e.g., c('young','old')).

allocation

the relative ratio of sample size in each subgroup (group*strata). See details. The value will be recycled if the length is less than needed.

event_lambda

the hazard rate of the primary endpoint (event). See details. The value will be recycled if the length is less than needed.

drop_rate

(optional) the drop-out rate (patients/month). Not hazard rate. See details. The value will be recycled if the length is less than needed.

death_lambda

(optional) the hazard rate of death. The value will be recycled if the length is less than needed.

n_rand

(required when rand_rate=NULL) a vector of the number of randomization each month; can be non-integers.

rand_rate

(required when n_rand=NULL) the randomization rate (patients/month; can be non-integer).

total_sample

(required when n_rand=NULL) total scheduled sample size.

add_column

request additional columns of the returned data frame.
Valid options are:

  • 'eventT_abs': absolute event time from the beginning of the trial (=eventT+randT)

  • 'dropT_abs': absolute drop-out time from the beginning of the trial (=dropT+randT)

  • 'deathT_abs': absolute death time from the beginning of the trial (=deathT+randT)

  • 'censor': censoring (drop-out or death) indicator

  • 'event': event indicator

  • 'censor_reason': censoring reason ('drop_out','death','never_event'(eventT=inf))

  • 'followT': follow-up time (true observed time) from randT

  • 'followT_abs': absolute follow-up time from the beginning of the trial (=followT+randT)

simplify

whether drop unused columns (e.g., the group variable when there is only one group). See details.

advanced_dist

use user-specified distributions for event, drop-out and death. A list containing random generation functions. See details and examples.

Details

See webpage https://zjph602xtc.github.io/PWEXP/ for a diagram illustration of the relationship between returned variables.

The total number of subgroups will be '# treatment groups' * '# strata'. The strata variable will be distributed into each treatment group. For example, if group = c('trt','placebo'), strata=c('A','B','C'), then there will be 6 subgroups: trt+A, trt+B, trt+C, placebo+A, placebo+B, placebo+C. The lengths of allocation, event_lambda, drop_rate, death_lambda should be 6 as well. Note that the values will be recycled for these variables. For example, if allocation=c(1,2,3), then the proportion of 6 subgroups is actually 1:2:3:1:2:3, which means 1:1 ratio for groups, 1:2:3 ratio in each stratum.

The event_lambda (\lambda) is the hazard rate of the interested events. The density function of events is f(t)=\lambda e^{-\lambda*t}. Similarly, the death_lambda is the hazard rate of death.

The drop_rate is the probability of drop-out at t=1, which means the hazard rate of drop-out is -log(1-drop_rate) (or say, drop_rate=1-e^{-hazard rate}.

When simplify=TRUE, these columns will NOT be included:

advanced_dist is used to define non-exponential distributions for event, drop-out or death. It is a list containing at least one of the elements: event_dist, drop_dist, death_dist. Each element has random generation functions for each subgroups. For example, advanced_dist=list(event_dist=c(function1, function2), drop_dist=c(function3, function4)). Here function1, function3 are the event, drop-out generation function for the first subgroup; function2, function4 for the second. If there is a third subgroup, function1, function3 will be reused. Each data generation function (functionX) is a function with only one input argument n (sample size). If any of the event_dist, drop_dist, death_dist is missing, then we search for event_lambda, drop_rate, death_lambda to generate a exp distribution; if they are also missing, then corresponding variable will not be generated .

Value

A data frame containing the some of these columns:

ID

subject ID

group

group indicator

strata

stratum indicator

randT

randomization time (from the beginning of the trial)

eventT

event time (from randT)

eventT_abs

event time (from the beginning of the trial)

dropT

drop-out time (from randT)

dropT_abs

drop-out time (from the beginning of the trial)

deathT

death time (from randT)

deathT_abs

death time (from the beginning of the trial)

censor

censoring (drop-out or death) indicator

censor_reason

censoring reason ('drop_out','death','never_event'(followT=inf))

event

event indicator

followT

follow-up time / observed time (from randT)

followT_abs

follow-up time / observed time (from the beginning of the trial)

Note

event_lambda, drop_rate, death_lambda can be 0, which means the corresponding subgroup will have an Inf value for each variable.

Author(s)

Tianchen Xu zjph602xutianchen@gmail.com

See Also

rpwexp, rpwexp_conditional

Examples

# Two groups with two strata. In the treatment group, there is a treatment
# sensitive stratum and a non-sensitive stratum. In the placebo group, all
# subjects are the same. Treatment:place=1:2. Drop rate=1% only in treatment group.
dat <- simdata(group=c('trt', 'place'), strata = c('sensitive','non-sensitive'),
               allocation = c(1,1,2,2), rand_rate = 20, total_sample = 1000,
               event_lambda = c(0.1, 0.2, 0.01, 0.01),
               drop_rate = c(0.01, 0.01, 0, 0))
# randomized subjects
table(dat$group,dat$strata)
# randomization curve
plot(sort(dat$randT), 1:1000, xlab='time', ylab='randomized subjects')
# event time in treatment group
plot(ecdf(dat$eventT[dat$group=='trt' & dat$strata=='sensitive']))
lines(ecdf(dat$eventT[dat$group=='trt' & dat$strata=='non-sensitive']), col='red')


# One group. Event follows a piecewise exponential distribution; drop-out follows
# a Weibull; death follows a exponential.
dist_trt <- function(n)rpwexp(n, rate=c(0.01, 0.05, 0.01), breakpoint = c(30,60))
dist_placebo <- function(n)rpwexp(n, rate=c(0.01, 0.005), breakpoint = c(50))
dat <- simdata(group = c('trt','placebo'), n_rand = c(rep(10,50),rep(20,10)),
               death_lambda = 0.01,
               advanced_dist = list(event_dist=c(dist_trt, dist_placebo),
                                    drop_dist=function(n)rweibull(n,3,40)))
# randomized subjects
table(dat$group)
# randomization curve
plot(sort(dat$randT), 1:700, xlab='time', ylab='randomized subjects')
# event time in both groups
plot(ecdf(dat$eventT[dat$group=='trt']), xlim=c(0,100))
lines(ecdf(dat$eventT[dat$group=='placebo']), col='red')
# drop-out time
plot(ecdf(dat$dropT), xlim=c(0,100))


# mixture cure distribution, 20% of the subject are cured and will not have events
dat <- simdata(strata=c('cure','non-cure'), allocation=c(20,80),
        event_lambda=c(0, 0.38), n_rand = rep(20,30),
        add_column = c('eventT_abs', 'censor', 'event',
                       'censor_reason', 'followT', 'followT_abs'))

[Package PWEXP version 0.5.0 Index]