R: Simulation of a PH cure model with time-varying covariates

penPHcure.simulate {penPHcure}

R Documentation

Simulation of a PH cure model with time-varying covariates

Description

This function allows to simulate data from a PH cure model with time-varying covariates:

the event-times are generated on a continuous scale from a piecewise exponential distribution conditional on time-varying covariates and regression coefficients beta0, using a method similar to the one described in Hendry (2014). The time varying covariates are constant in the intervals (s_{j-1},s_j], for j=1,,,,J.
the censoring times are generated from an exponential distribution truncated above s_J;
the susceptibility indicators are generated from a logistic regression model conditional on time-invariant covariates and regression coefficients b0.

Usage

penPHcure.simulate(
  N = 500,
  S = seq(0.1, 5, by = 0.1),
  b0 = c(1.2, -1, 0, 1, 0),
  beta0 = c(1, 0, -1, 0),
  gamma = 1,
  lambdaC = 1,
  mean_CURE = rep(0, length(b0) - 1L),
  mean_SURV = rep(0, length(beta0)),
  sd_CURE = rep(1, length(b0) - 1L),
  sd_SURV = rep(1, length(beta0)),
  cor_CURE = diag(length(b0) - 1L),
  cor_SURV = diag(length(beta0)),
  X = NULL,
  Z = NULL,
  C = NULL
)

Arguments

`N`	the sample size (number of individuals). By default, `N = 500`.
`S`	a numeric vector containing the end of the time intervals, in ascending order, over which the time-varying covariates are constant (the first interval start at 0). By default, `S = seq(0.1, 5, by=0.1)`.
`b0`	a numeric vector with the true coefficients in the incidence (cure) component, used to generate the susceptibility indicators. By default, `b0 = c(1.2,-1,0,1,0)`.
`beta0`	a numeric vector with the true regression coefficients in the latency (survival) component, used to generate the event times. By default, `beta0 = c(1,0,-1,0)`.
`gamma`	a positive numeric value, parameter controlling the shape of the baseline hazard function: `\lambda_0(t) = \gamma t^{\gamma-1}`. By default, `gamma = 1`.
`lambdaC`	a positive numeric value, parameter of the truncated exponential distribution used to generate the censoring times. By default, `lambdaC = 1`.
`mean_CURE`	a numeric vector of means for the variables used to generate the susceptibility indicators. By default, all zeros.
`mean_SURV`	a numeric vector of means for the variables used to generate the event-times. By default, all zeros.
`sd_CURE`	a numeric vector of standard deviations for the variables used to generate the susceptibility indicators. By default, all ones.
`sd_SURV`	a numeric vector of standard deviations for the variables used to generate the event-times. By default, all ones.
`cor_CURE`	the correlation matrix of the variables used to generate the susceptibility indicators. By default, an identity matrix.
`cor_SURV`	the correlation matrix of the variables used to generate the event-times. By default, an identity matrix.
`X`	[optional] a matrix of time-invariant covariates used to generate the susceptibility indicators, with dimension `N` by `length(b0)-1L`. By default, `X = NULL`.
`Z`	[optional] an array of time-varying covariates used to generate the censoring times, with dimension `length(S)` by `length(beta)` by `N`. By default, `Z = NULL`.
`C`	[optional] a vector of censoring times with `N` elements. By default, `C = NULL`.

Details

By default, the time-varying covariates in the latency (survival) component are generated from a multivariate normal distribution with means mean_SURV, standard deviations sd_SURV and correlation matrix cor_SURV. Otherwise, they can be provided by the user using the argument Z. In this case, the arguments mean_SURV, sd_SURV and cor_SURV will be ignored.

By default, the time-invariant covariates in the incidence (cure) component are generated from a multivariate normal distribution with means mean_CURE, standard deviations sd_CURE and correlation matrix cor_CURE. Otherwise, they can be provided by the user using the argument X. In this case, the arguments mean_CURE, sd_CURE and cor_CURE will be ignored.

Value

A data.frame with columns:

`id`	unique ID number associated to each individual.
`tstart`	start of the time interval.
`tstop`	end of the time interval.
`status`	event indicator, 1 if the event occurs or 0, otherwise.
`z.?`	one or more columns of covariates used to generate the survival times.
`x.?`	one or more columns of covariates used to generate the susceptibility indicator (constant over time).

In addition, it contains the following attributes:

`perc_cure`	Percentage of individuals not susceptible to the event of interest.
`perc_cens`	Percentage of censoring.

References

Hendry DJ (2014). “Data generation for the Cox proportional hazards model with time-dependent covariates: a method for medical researchers.” Statistics in Medicine, 33(3), 436-454. doi: 10.1002/sim.5945.

Examples

### Example 1:
###  - event-times generated from a Cox's PH model with unit baseline hazard
###    and time-varying covariates generated from independent standard normal 
###    distributions over the intervals (0,s_1], (s_1,s_2], ..., (s_1,s_J]. 
###  - censoring times generated from an exponential distribution truncated 
###    above s_J.
###  - covariates in the incidence (cure) component generated from independent 
###    standard normal distributions.

# Define the sample size
N <- 250
# Define the time intervals for the time-varying covariates
S <- seq(0.1, 5, by=0.1)
# Define the true regression coefficients (incidence and latency)  
b0 <- c(1,-1,0,1,0)
beta0 <- c(1,0,-1,0)
# Define the parameter of the truncated exponential distribution (censoring) 
lambdaC <- 1.5
# Simulate the data
data1 <- penPHcure.simulate(N = N,S = S,
                            b0 = b0,
                            beta0 = beta0,
                            lambdaC = lambdaC)

                           
### Example 2:
###  Similar to the previous example, but with a baseline hazard function 
###   defined as lambda_0(t) = 3t^2.

# Define the sample size
N <- 250
# Define the time intervals for the time-varying covariates
S <- seq(0.1, 5, by=0.1)
# Define the true regression coefficients (incidence and latency)  
b0 <- c(1,-1,0,1,0)
beta0 <- c(1,0,-1,0)
# Define the parameter controlling the shape of the baseline hazard function
gamma <- 3
# Simulate the data
data2 <- penPHcure.simulate(N = N,S = S,
                            b0 = b0,
                            beta0 = beta0,
                            gamma = gamma)


### Example 3:
###  Simulation with covariates in the cure and survival components generated
###   from multivariate normal (MVN) distributions with specific means, 
###   standard deviations and correlation matrices.

# Define the sample size
N <- 250
# Define the time intervals for the time-varying covariates
S <- seq(0.1, 5, by=0.1)
# Define the true regression coefficients (incidence and latency)  
b0 <- c(-1,-1,0,1,0)
beta0 <- c(1,0,-1,0)
# Define the means of the MVN distribution (incidence and latency)  
mean_CURE <- c(-1,0,1,2)
mean_SURV <- c(2,1,0,-1)
# Define the std. deviations of the MVN distribution (incidence and latency)  
sd_CURE <- c(0.5,1.5,1,0.5)
sd_SURV <- c(0.5,1,1.5,0.5)
# Define the correlation matrix of the MVN distribution (incidence and latency)  
cor_CURE <- matrix(NA,4,4)
for (p in 1:4)
  for (q in 1:4)
    cor_CURE[p,q] <- 0.8^abs(p - q)
cor_SURV <- matrix(NA,4,4)
for (p in 1:4)
  for (q in 1:4)
    cor_SURV[p,q] <- 0.8^abs(p - q)
# Simulate the data
data3 <- penPHcure.simulate(N = N,S = S,
                            b0 = b0,
                            beta0 = beta0,
                            mean_CURE = mean_CURE,
                            mean_SURV = mean_SURV,
                            sd_CURE = sd_CURE,
                            sd_SURV = sd_SURV,
                            cor_CURE = cor_CURE,
                            cor_SURV = cor_SURV)


### Example 4:
###  Simulation with covariates in the cure and survival components from a 
###   data generating process specified by the user. 

# Define the sample size
N <- 250
# Define the time intervals for the time-varying covariates
S <- seq(0.1, 5, by=0.1)
# Define the true regression coefficients (incidence and latency)  
b0 <- c(1,-1,0,1,0)
beta0 <- c(1,0,-1,0)
# As an example, we simulate data with covariates following independent
#  standard uniform distributions. But the user could provide random draws 
#  from any other distribution. Be careful!!! X should be a matrix of size 
#  N x length(b0) and Z an array of size length(S) x length(beta0) x N.
X <- matrix(runif(N*(length(b0)-1)),N,length(b0)-1)
Z <- array(runif(N*length(S)*length(beta0)),c(length(S),length(beta0),N))
data4 <- penPHcure.simulate(N = N,S = S,
                            b0 = b0,
                            beta0 = beta0,
                            X = X,
                            Z = Z)


### Example 5:
###  Simulation with censoring times from a data generating process 
###   specified by the user

# Define the sample size
N <- 250
# Define the time intervals for the time-varying covariates
S <- seq(0.1, 5, by=0.1)
# Define the true regression coefficients (incidence and latency)  
b0 <- c(1,-1,0,1,0)
beta0 <- c(1,0,-1,0)
# As an example, we simulate data with censoring times following
#  a standard uniform distribution between 0 and S_J.
#  Be careful!!! C should be a numeric vector of length N.
C <- runif(N)*max(S)
data5 <- penPHcure.simulate(N = N,S = S,
                            b0 = b0,
                            beta0 = beta0,
                            C = C)

[Package penPHcure version 1.0.2 Index]