penPHcure.simulate {penPHcure} | R Documentation |
Simulation of a PH cure model with time-varying covariates
Description
This function allows to simulate data from a PH cure model with time-varying covariates:
the event-times are generated on a continuous scale from a piecewise exponential distribution conditional on time-varying covariates and regression coefficients
beta0
, using a method similar to the one described in Hendry (2014). The time varying covariates are constant in the intervals(s_{j-1},s_j]
, forj=1,,,,J
.the censoring times are generated from an exponential distribution truncated above
s_J
;the susceptibility indicators are generated from a logistic regression model conditional on time-invariant covariates and regression coefficients
b0
.
Usage
penPHcure.simulate(
N = 500,
S = seq(0.1, 5, by = 0.1),
b0 = c(1.2, -1, 0, 1, 0),
beta0 = c(1, 0, -1, 0),
gamma = 1,
lambdaC = 1,
mean_CURE = rep(0, length(b0) - 1L),
mean_SURV = rep(0, length(beta0)),
sd_CURE = rep(1, length(b0) - 1L),
sd_SURV = rep(1, length(beta0)),
cor_CURE = diag(length(b0) - 1L),
cor_SURV = diag(length(beta0)),
X = NULL,
Z = NULL,
C = NULL
)
Arguments
N |
the sample size (number of individuals). By default, |
S |
a numeric vector containing the end of the time intervals, in ascending order, over which the time-varying covariates are constant (the first interval start at 0). By default, |
b0 |
a numeric vector with the true coefficients in the incidence (cure) component, used to generate the susceptibility indicators. By default, |
beta0 |
a numeric vector with the true regression coefficients in the latency (survival) component, used to generate the event times. By default, |
gamma |
a positive numeric value, parameter controlling the shape of the baseline hazard function: |
lambdaC |
a positive numeric value, parameter of the truncated exponential distribution used to generate the censoring times. By default, |
mean_CURE |
a numeric vector of means for the variables used to generate the susceptibility indicators. By default, all zeros. |
mean_SURV |
a numeric vector of means for the variables used to generate the event-times. By default, all zeros. |
sd_CURE |
a numeric vector of standard deviations for the variables used to generate the susceptibility indicators. By default, all ones. |
sd_SURV |
a numeric vector of standard deviations for the variables used to generate the event-times. By default, all ones. |
cor_CURE |
the correlation matrix of the variables used to generate the susceptibility indicators. By default, an identity matrix. |
cor_SURV |
the correlation matrix of the variables used to generate the event-times. By default, an identity matrix. |
X |
[optional] a matrix of time-invariant covariates used to generate the susceptibility indicators, with dimension |
Z |
[optional] an array of time-varying covariates used to generate the censoring times, with dimension |
C |
[optional] a vector of censoring times with |
Details
By default, the time-varying covariates in the latency (survival) component are generated from a multivariate normal distribution with means mean_SURV
, standard deviations sd_SURV
and correlation matrix cor_SURV
. Otherwise, they can be provided by the user using the argument Z
. In this case, the arguments mean_SURV
, sd_SURV
and cor_SURV
will be ignored.
By default, the time-invariant covariates in the incidence (cure) component are generated from a multivariate normal distribution with means mean_CURE
, standard deviations sd_CURE
and correlation matrix cor_CURE
. Otherwise, they can be provided by the user using the argument X
. In this case, the arguments mean_CURE
, sd_CURE
and cor_CURE
will be ignored.
Value
A data.frame
with columns:
id |
unique ID number associated to each individual. |
tstart |
start of the time interval. |
tstop |
end of the time interval. |
status |
event indicator, 1 if the event occurs or 0, otherwise. |
z.? |
one or more columns of covariates used to generate the survival times. |
x.? |
one or more columns of covariates used to generate the susceptibility indicator (constant over time). |
In addition, it contains the following attributes:
perc_cure |
Percentage of individuals not susceptible to the event of interest. |
perc_cens |
Percentage of censoring. |
References
Hendry DJ (2014). “Data generation for the Cox proportional hazards model with time-dependent covariates: a method for medical researchers.” Statistics in Medicine, 33(3), 436-454. doi: 10.1002/sim.5945.
Examples
### Example 1:
### - event-times generated from a Cox's PH model with unit baseline hazard
### and time-varying covariates generated from independent standard normal
### distributions over the intervals (0,s_1], (s_1,s_2], ..., (s_1,s_J].
### - censoring times generated from an exponential distribution truncated
### above s_J.
### - covariates in the incidence (cure) component generated from independent
### standard normal distributions.
# Define the sample size
N <- 250
# Define the time intervals for the time-varying covariates
S <- seq(0.1, 5, by=0.1)
# Define the true regression coefficients (incidence and latency)
b0 <- c(1,-1,0,1,0)
beta0 <- c(1,0,-1,0)
# Define the parameter of the truncated exponential distribution (censoring)
lambdaC <- 1.5
# Simulate the data
data1 <- penPHcure.simulate(N = N,S = S,
b0 = b0,
beta0 = beta0,
lambdaC = lambdaC)
### Example 2:
### Similar to the previous example, but with a baseline hazard function
### defined as lambda_0(t) = 3t^2.
# Define the sample size
N <- 250
# Define the time intervals for the time-varying covariates
S <- seq(0.1, 5, by=0.1)
# Define the true regression coefficients (incidence and latency)
b0 <- c(1,-1,0,1,0)
beta0 <- c(1,0,-1,0)
# Define the parameter controlling the shape of the baseline hazard function
gamma <- 3
# Simulate the data
data2 <- penPHcure.simulate(N = N,S = S,
b0 = b0,
beta0 = beta0,
gamma = gamma)
### Example 3:
### Simulation with covariates in the cure and survival components generated
### from multivariate normal (MVN) distributions with specific means,
### standard deviations and correlation matrices.
# Define the sample size
N <- 250
# Define the time intervals for the time-varying covariates
S <- seq(0.1, 5, by=0.1)
# Define the true regression coefficients (incidence and latency)
b0 <- c(-1,-1,0,1,0)
beta0 <- c(1,0,-1,0)
# Define the means of the MVN distribution (incidence and latency)
mean_CURE <- c(-1,0,1,2)
mean_SURV <- c(2,1,0,-1)
# Define the std. deviations of the MVN distribution (incidence and latency)
sd_CURE <- c(0.5,1.5,1,0.5)
sd_SURV <- c(0.5,1,1.5,0.5)
# Define the correlation matrix of the MVN distribution (incidence and latency)
cor_CURE <- matrix(NA,4,4)
for (p in 1:4)
for (q in 1:4)
cor_CURE[p,q] <- 0.8^abs(p - q)
cor_SURV <- matrix(NA,4,4)
for (p in 1:4)
for (q in 1:4)
cor_SURV[p,q] <- 0.8^abs(p - q)
# Simulate the data
data3 <- penPHcure.simulate(N = N,S = S,
b0 = b0,
beta0 = beta0,
mean_CURE = mean_CURE,
mean_SURV = mean_SURV,
sd_CURE = sd_CURE,
sd_SURV = sd_SURV,
cor_CURE = cor_CURE,
cor_SURV = cor_SURV)
### Example 4:
### Simulation with covariates in the cure and survival components from a
### data generating process specified by the user.
# Define the sample size
N <- 250
# Define the time intervals for the time-varying covariates
S <- seq(0.1, 5, by=0.1)
# Define the true regression coefficients (incidence and latency)
b0 <- c(1,-1,0,1,0)
beta0 <- c(1,0,-1,0)
# As an example, we simulate data with covariates following independent
# standard uniform distributions. But the user could provide random draws
# from any other distribution. Be careful!!! X should be a matrix of size
# N x length(b0) and Z an array of size length(S) x length(beta0) x N.
X <- matrix(runif(N*(length(b0)-1)),N,length(b0)-1)
Z <- array(runif(N*length(S)*length(beta0)),c(length(S),length(beta0),N))
data4 <- penPHcure.simulate(N = N,S = S,
b0 = b0,
beta0 = beta0,
X = X,
Z = Z)
### Example 5:
### Simulation with censoring times from a data generating process
### specified by the user
# Define the sample size
N <- 250
# Define the time intervals for the time-varying covariates
S <- seq(0.1, 5, by=0.1)
# Define the true regression coefficients (incidence and latency)
b0 <- c(1,-1,0,1,0)
beta0 <- c(1,0,-1,0)
# As an example, we simulate data with censoring times following
# a standard uniform distribution between 0 and S_J.
# Be careful!!! C should be a numeric vector of length N.
C <- runif(N)*max(S)
data5 <- penPHcure.simulate(N = N,S = S,
b0 = b0,
beta0 = beta0,
C = C)