R: Bayesian Cure Rate Modeling for Time-to-Event Data

bayesCureRateModel-package {bayesCureRateModel}

R Documentation

Bayesian Cure Rate Modeling for Time-to-Event Data

Description

A fully Bayesian approach in order to estimate a general family of cure rate models under the presence of covariates, see Papastamoulis and Milienos (2023) <doi:10.48550/arXiv.2310.06926>. The promotion time can be modelled (a) parametrically using typical distributional assumptions for time to event data (including the Weibull, Exponential, Gompertz, log-Logistic distributions), or (b) semiparametrically using finite mixtures of Gamma distributions. Posterior inference is carried out by constructing a Metropolis-coupled Markov chain Monte Carlo (MCMC) sampler, which combines Gibbs sampling for the latent cure indicators and Metropolis-Hastings steps with Langevin diffusion dynamics for parameter updates. The main MCMC algorithm is embedded within a parallel tempering scheme by considering heated versions of the target posterior distribution.

The main function of the package is cure_rate_MC3. See details for a brief description of the model.

Details

Let \boldsymbol{y} = (y_1,\ldots,y_n) denote the observed data, which correspond to time-to-event data or censoring times. Let also \boldsymbol{x}_i = (x_{i1},\ldots,x_{x_{ip}})' denote the covariates for subject i, i=1,\ldots,n.

Assuming that the n observations are independent, the observed likelihood is defined as

L=L({\boldsymbol \theta}; {\boldsymbol y}, {\boldsymbol x})=\prod_{i=1}^{n}f_P(y_i;{\boldsymbol\theta},{\boldsymbol x}_i)^{\delta_i}S_P(y_i;{\boldsymbol \theta},{\boldsymbol x}_i)^{1-\delta_i},

where \delta_i=1 if the i-th observation corresponds to time-to-event while \delta_i=0 indicates censoring time. The parameter vector \boldsymbol\theta is decomposed as

\boldsymbol\theta = (\boldsymbol\alpha', \boldsymbol\beta', \gamma,\lambda)

where

\boldsymbol\alpha = (\alpha_1,\ldots,\alpha_d)'\in\mathcal A are the parameters of the promotion time distribution whose cumulative distribution and density functions are denoted as F(\cdot,\boldsymbol\alpha) and f(\cdot,\boldsymbol\alpha), respectively.
\boldsymbol\beta\in\mathbf R^{k} are the regression coefficients with k denoting the number of columns in the design matrix (it may include a constant term or not).
\gamma\in\mathbf R
\lambda > 0.

The population survival and density functions are defined as

S_P(y;\boldsymbol\theta) = \left(1 + \gamma\exp\{\boldsymbol{x}_i\boldsymbol{\beta}'\}c^{\gamma\exp\{\boldsymbol{x}_i\boldsymbol{\beta}'\}}F(y;\boldsymbol\alpha)^\lambda\right)^{-1/\gamma}

whereas,

f_P(y;\boldsymbol\theta)=-\frac{\partial S_P(y;\boldsymbol\theta)}{\partial y}.

Finally, the cure rate is affected through covariates and parameters as follows

p_0(\boldsymbol{x}_i;\boldsymbol{\theta}) = \left(1 + \gamma\exp\{\boldsymbol{x}_i\boldsymbol{\beta}'\}c^{\gamma\exp\{\boldsymbol{x}_i\boldsymbol{\beta}'\}}\right)^{-1/\gamma}

where c = e^{e^{-1}}.

The promotion time distribution can be a member of standard families (currently available are the following: Exponential, Weibull, Gamma, Lomax, Gompertz, log-Logistic) and in this case \alpha = (\alpha_1,\alpha_2)\in (0,\infty)^2. Also considered is the Dagum distribution, which has three parameters (\alpha_1,\alpha_2,\alpha_3)\in(0,\infty)^3. In case that the previous parametric assumptions are not justified, the promotion time can belong to the more flexible family of finite mixtures of Gamma distributions. For example, assume a mixture of two Gamma distributions of the form

f(y;\boldsymbol \alpha) = \alpha_5 f_{\mathcal G}(y;\alpha_1,\alpha_3) + (1-\alpha_5) f_{\mathcal G}(y;\alpha_2,\alpha_4),

where

f_\mathcal{G}(y;\alpha,\beta)=\frac{\beta^{\alpha}}{\Gamma(\alpha)}y^{\alpha-1}\exp\{-\beta y\}, y>0

denotes the density of the Gamma distribution with parameters \alpha > 0 (shape) and \beta > 0 (rate). For the previous model, the parameter vector is

\boldsymbol\alpha = (\alpha_1,\alpha_2,\alpha_3,\alpha_4,\alpha_5)'\in\mathcal A

where \mathcal A = (0,\infty)^4\times (0,1).

More generally, one can fit a mixture of K>2 Gamma distributions. The appropriate model can be selected according to information criteria such as the BIC.

The binary vector \boldsymbol{I} = (I_1,\ldots,I_n) contains the (latent) cure indicators, that is, I_i = 1 if the i-th subject is susceptible and I_i = 0 if the i-th subject is cured. \Delta_0 denotes the subset of \{1,\ldots,n\} containing the censored subjects, whereas \Delta_1 = \Delta_0^c is the (complementary) subset of uncensored subjects. The complete likelihood of the model is

L_c(\boldsymbol{\theta};\boldsymbol{y}, \boldsymbol{I}) = \prod_{i\in\Delta_1}(1-p_0(\boldsymbol{x}_i,\boldsymbol\theta))f_U(y_i;\boldsymbol\theta,\boldsymbol{x}_i)\\ \prod_{i\in\Delta_0}p_0(\boldsymbol{x}_i,\boldsymbol\theta)^{1-I_i}\{(1-p_0(\boldsymbol{x}_i,\boldsymbol\theta))S_U(y_i;\boldsymbol\theta,\boldsymbol{x}_i)\}^{I_i}.

f_U and S_U denote the probability density and survival function of the susceptibles, respectively, that is

S_U(y_i;\boldsymbol\theta,{\boldsymbol x}_i)=\frac{S_P(y_i;\boldsymbol{\theta},{\boldsymbol x}_i)-p_0({\boldsymbol x}_i;\boldsymbol\theta)}{1-p_0({\boldsymbol x}_i;\boldsymbol\theta)}, f_U(y_i;\boldsymbol\theta,{\boldsymbol x}_i)=\frac{f_P(y_i;\boldsymbol\theta,{\boldsymbol x}_i)}{1-p_0({\boldsymbol x}_i;\boldsymbol\theta)}.

Index of help topics:

bayesCureRateModel-package
                        Bayesian Cure Rate Modeling for Time-to-Event
                        Data
complete_log_likelihood_general
                        Logarithm of the complete log-likelihood for
                        the general cure rate model.
cure_rate_MC3           Main function of the package
cure_rate_mcmc          The basic MCMC scheme.
log_dagum               PDF and CDF of the Dagum distribution
log_gamma               PDF and CDF of the Gamma distribution
log_gamma_mixture       PDF and CDF of a Gamma mixture distribution
log_gompertz            PDF and CDF of the Gompertz distribution
log_logLogistic         PDF and CDF of the log-Logistic distribution.
log_lomax               PDF and CDF of the Lomax distribution
log_weibull             PDF and CDF of the Weibull distribution
marriage_dataset        Marriage data
plot.bayesCureModel     Plot method
print.bayesCureModel    Print method
summary.bayesCureModel
                        Summary method.

Author(s)

Panagiotis Papastamoulis and Fotios S. Milienos

Maintainer: Panagiotis Papastamoulis <papapast@yahoo.gr>

References

Papastamoulis and Milienos (2023). Bayesian inference and cure rate modeling for event history data. arXiv:2310.06926

Examples

# TOY EXAMPLE (very small numbers... only for CRAN check purposes)
# simulate toy data 
	set.seed(10)
	n = 4
	stat = rbinom(n, size = 1, prob = 0.5)
	x <- cbind(1, matrix(rnorm(n), n, 1))
	y <- rexp(n)
# run a weibull model with default prior setup
# considering 2 heated chains 
	fit1 <- cure_rate_MC3(y = y, X = x, Censoring_status = stat, 
		promotion_time = list(distribution = 'weibull'),
		nChains = 2, 
		nCores = 1, 
		mcmc_cycles = 3, sweep=2)
#	print method
	fit1	
# 	summary method	
	summary1 <- summary(fit1)
	
# WARNING: the following parameters
#  mcmc_cycles, nChains
#        should take _larger_ values. E.g. a typical implementation consists of:
#        mcmc_cycles = 15000, nChains = 12
	

# run a Gamma mixture model with K = 2 components and default prior setup
	fit2 <- cure_rate_MC3(y = y, X = x, Censoring_status = stat, 
		promotion_time = list(
			distribution = 'gamma_mixture',
		        K = 2),
		nChains = 8, nCores = 2, 
		mcmc_cycles = 10)
	summary2 <- summary(fit2)