R: Generate survival data

genfrail {frailtySurv}

R Documentation

Generate survival data

Description

Generate clustered survival data from a shared frailty model, with hazard function given by

S(t)=\exp [-\Lambda_0(t) \omega_i \exp (\beta Z_{ij})]

where \Lambda_0 is the cumulative baseline hazard, \omega_i is the frailty value of cluster i, \beta is the regression coefficient vector, and Z_ij is the covariate vector for individual i in cluster j.

The baseline hazard can be specified by the inverse cumualative baseline hazard, cumulative baseline hazard, or simply the baseline hazard. Frailty values can be sampled from gamma, power variance function (PVF), log-normal, inverse Gaussian, and positive stable distributions.

Usage

genfrail(N = 300, K = 2, K.param = c(2, 0), beta = c(log(2)),
         frailty = "gamma", theta = c(2), 
         covar.distr = "normal", covar.param = c(0, 1), covar.matrix = NULL,
         censor.distr = "normal", censor.param = c(130, 15), 
         censor.rate = NULL, censor.time = NULL,
         lambda_0 = NULL, Lambda_0 = NULL, Lambda_0_inv = NULL, 
         round.base = NULL, control, ...)

Arguments

`N`	integer; number of clusters
`K`	integer, string, or vector; If an integer, the number of members in each cluster. If a string, the name of the distribution to sample the cluster sizes from. This can be one of: "poisson", "pareto", or "uniform". The `K.param` argument specifies the distribution parameters. If a vector, must be of length N and contains the integer size of each cluster.
`K.param`	vector of the cluster size distribution parameters if `K` is a string. If "possion", the vector should contain the rate and truncated value (see `rtpois`). If "pareto", the exponent, lower, and upper bounds (see `rtzeta`). If "uniform", the lower (noninclusive) and upper (inclusive) bounds.
`beta`	vector of regression coefficients.
`frailty`	string name of the frailty distribution. Can be one of: "gamma", "pvf", "lognormal", "invgauss", "posstab", or "none". See `dgamma_r`,`dpvf_r`, `dlognormal_r`, `dinvgauss_r`, `posstab_r` for the respective density functions. (Also see the *_c for C implementations of the respective density functions.)
`theta`	vector the frailty distribution parameters
`covar.distr`	string distribution to sample covariates from. Can be one of: "normal", "uniform", "zero"
`covar.param`	vector covariate distribution parameters.
`covar.matrix`	matrix with dimensions `c(NK, length(beta))` that contains the desired covariates. If not NULL, this overrides `covar.distr` and `covar.param`.
`censor.distr`	string censoring distribution to use. Followup times are sampled from the censoring distribution to simulate non-informative right censorship. The censoring distribution can be one of: "normal", "lognormal", "uniform", "none".
`censor.param`	vector of censoring distribution parameters. For normal and lognormal censorship, this should be c(mu,sigma) where mu is the mean and sigma is the standard deviation (Note: this is still the mean and standard deviation for lognormal). For uniform censorship, the vector `c(lower, upper)` should specify the lower and upper bounds.
`censor.rate`	numeric value between 0 and 1 to specify the empirical censoring rate. The mean specified in the `censor.param` parameter is adjusted to achieve a desired censoring rate if `censor.rate` is given. Note that the standard deviation (the second parameter in `censor.param`) must still be specified so that the problem is identifiable. For uniform censorship, the interval given by `c(lower, upper)` is adjusted to achieve the desired censorship, while keeping the variance fixed (i.e., upper - lower does not change).
`censor.time`	vector of right-censorship times. This must have length NK and specifies the right-censoring times of each observation. Note that this overrides all other censor. params and cannot be used with variable cluster sizes.
`lambda_0`	function baseline hazard. Only one of `lambda_0`, `Lambda_0`, and `Lambda_0_inv` need to be specified. Passing the baseline hazard (`lambda_0`) is the most computationally expensive since this requires numerical integration inside a root-finding algorithm.
`Lambda_0`	function cumulative baseline hazard. This overrides `lambda_0`.
`Lambda_0_inv`	function inverse cumulative baseline hazard. This overrides both `lambda_0` and `Lambda_0`.
`round.base`	numeric if specified, round the followup times to the nearest `round.base`
`control`	control parameters in the form of a `genfrail.control` object
`...`	additional arguments will be passed to `genfrail.control`

Value

A data.frame with row-observations is returned.

`family`	the cluster
`rep`	the member within each cluster
`time`	observed followup time
`status`	failure indicator
`Z1...`	covariates, where there are `length(beta)` Z columns

Author(s)

John V. Monaco, Malka Gorfine, and Li Hsu.

Examples

# Generate the same dataset 3 different ways

# Using the baseline hazard (least efficient)
set.seed(1234)
dat.1 <- genfrail(N = 300, K = 2, 
                  beta = c(log(2),log(3)),
                  frailty = "gamma", theta = 2,
                  lambda_0=function(t, tau=4.6, C=0.01) (tau*(C*t)^tau)/t)

# Using the cumulative baseline hazard
set.seed(1234)
dat.2 <- genfrail(N = 300, K = 2, 
                  beta = c(log(2),log(3)),
                  frailty = "gamma", theta = 2, 
                  Lambda_0 = function(t, tau=4.6, C=0.01) (C*t)^tau)

# Using the inverse cumulative baseline hazard (most efficient)
set.seed(1234)
dat.3 <- genfrail(N = 300, K = 2, 
                  beta = c(log(2),log(3)),
                  frailty = "gamma", theta = 2, 
                  Lambda_0_inv=function(t, tau=4.6, C=0.01) (t^(1/tau))/C)

# Generate data with PVF frailty, truncated Poisson cluster sizes, normal
# covariates, and 0.35 censorship from a lognormal distribution
set.seed(1234)
dat.4 <- genfrail(N = 100, K = "poisson", K.param=c(5, 1), 
                  beta = c(log(2),log(3)),
                  frailty = "pvf", theta = 0.3, 
                  covar.distr = "lognormal", 
                  censor.rate = 0.35) # Use the default baseline hazard

# Cluster sizes have size >= 2, summarized by
summary(dat.4)

# An oscillating baseline hazard
set.seed(1234)
dat.5 <- genfrail(lambda_0=function(t, tau=4.6, C=0.01, A=2, f=0.1) 
                              A^sin(f*pi*t) * (tau*(C*t)^tau)/t)

# Uniform censorship with 0.25 censoring rate
set.seed(1234)
dat.6 <- genfrail(N = 300, K = 2, 
                  beta = c(log(2),log(3)),
                  frailty = "gamma", theta = 2, 
                  censor.distr = "uniform", 
                  censor.param = c(50, 150), 
                  censor.rate = 0.25,
                  Lambda_0_inv=function(t, tau=4.6, C=0.01) (t^(1/tau))/C)

[Package frailtySurv version 1.3.8 Index]