bayesCureRateModel-package {bayesCureRateModel} | R Documentation |
Bayesian Cure Rate Modeling for Time-to-Event Data
Description
A fully Bayesian approach in order to estimate a general family of cure rate models under the presence of covariates, see Papastamoulis and Milienos (2023) <doi:10.48550/arXiv.2310.06926>. The promotion time can be modelled (a) parametrically using typical distributional assumptions for time to event data (including the Weibull, Exponential, Gompertz, log-Logistic distributions), or (b) semiparametrically using finite mixtures of Gamma distributions. Posterior inference is carried out by constructing a Metropolis-coupled Markov chain Monte Carlo (MCMC) sampler, which combines Gibbs sampling for the latent cure indicators and Metropolis-Hastings steps with Langevin diffusion dynamics for parameter updates. The main MCMC algorithm is embedded within a parallel tempering scheme by considering heated versions of the target posterior distribution.
The main function of the package is cure_rate_MC3
. See details for a brief description of the model.
Details
Let \boldsymbol{y} = (y_1,\ldots,y_n)
denote the observed data, which correspond to time-to-event data or censoring times. Let also \boldsymbol{x}_i = (x_{i1},\ldots,x_{x_{ip}})'
denote the covariates for subject i
, i=1,\ldots,n
.
Assuming that the n
observations are independent, the observed likelihood is defined as
L=L({\boldsymbol \theta}; {\boldsymbol y}, {\boldsymbol x})=\prod_{i=1}^{n}f_P(y_i;{\boldsymbol\theta},{\boldsymbol x}_i)^{\delta_i}S_P(y_i;{\boldsymbol \theta},{\boldsymbol x}_i)^{1-\delta_i},
where \delta_i=1
if the i
-th observation corresponds to time-to-event while \delta_i=0
indicates censoring time. The parameter vector \boldsymbol\theta
is decomposed as
\boldsymbol\theta = (\boldsymbol\alpha', \boldsymbol\beta', \gamma,\lambda)
where
-
\boldsymbol\alpha = (\alpha_1,\ldots,\alpha_d)'\in\mathcal A
are the parameters of the promotion time distribution whose cumulative distribution and density functions are denoted asF(\cdot,\boldsymbol\alpha)
andf(\cdot,\boldsymbol\alpha)
, respectively. -
\boldsymbol\beta\in\mathbf R^{k}
are the regression coefficients withk
denoting the number of columns in the design matrix (it may include a constant term or not). -
\gamma\in\mathbf R
-
\lambda > 0
.
The population survival and density functions are defined as
S_P(y;\boldsymbol\theta) = \left(1 + \gamma\exp\{\boldsymbol{x}_i\boldsymbol{\beta}'\}c^{\gamma\exp\{\boldsymbol{x}_i\boldsymbol{\beta}'\}}F(y;\boldsymbol\alpha)^\lambda\right)^{-1/\gamma}
whereas,
f_P(y;\boldsymbol\theta)=-\frac{\partial S_P(y;\boldsymbol\theta)}{\partial y}.
Finally, the cure rate is affected through covariates and parameters as follows
p_0(\boldsymbol{x}_i;\boldsymbol{\theta}) = \left(1 + \gamma\exp\{\boldsymbol{x}_i\boldsymbol{\beta}'\}c^{\gamma\exp\{\boldsymbol{x}_i\boldsymbol{\beta}'\}}\right)^{-1/\gamma}
where c = e^{e^{-1}}
.
The promotion time distribution can be a member of standard families (currently available are the following: Exponential, Weibull, Gamma, Lomax, Gompertz, log-Logistic) and in this case \alpha = (\alpha_1,\alpha_2)\in (0,\infty)^2
. Also considered is the Dagum distribution, which has three parameters (\alpha_1,\alpha_2,\alpha_3)\in(0,\infty)^3
. In case that the previous parametric assumptions are not justified, the promotion time can belong to the more flexible family of finite mixtures of Gamma distributions. For example, assume a mixture of two Gamma distributions of the form
f(y;\boldsymbol \alpha) = \alpha_5 f_{\mathcal G}(y;\alpha_1,\alpha_3) + (1-\alpha_5) f_{\mathcal G}(y;\alpha_2,\alpha_4),
where
f_\mathcal{G}(y;\alpha,\beta)=\frac{\beta^{\alpha}}{\Gamma(\alpha)}y^{\alpha-1}\exp\{-\beta y\}, y>0
denotes the density of the Gamma distribution with parameters \alpha > 0
(shape) and \beta > 0
(rate).
For the previous model, the parameter vector is
\boldsymbol\alpha = (\alpha_1,\alpha_2,\alpha_3,\alpha_4,\alpha_5)'\in\mathcal A
where \mathcal A = (0,\infty)^4\times (0,1)
.
More generally, one can fit a mixture of K>2
Gamma distributions. The appropriate model can be selected according to information criteria such as the BIC.
The binary vector \boldsymbol{I} = (I_1,\ldots,I_n)
contains the (latent) cure indicators, that is, I_i = 1
if the i
-th subject is susceptible and I_i = 0
if the i
-th subject is cured. \Delta_0
denotes the subset of \{1,\ldots,n\}
containing the censored subjects, whereas \Delta_1 = \Delta_0^c
is the (complementary) subset of uncensored subjects. The complete likelihood of the model is
L_c(\boldsymbol{\theta};\boldsymbol{y}, \boldsymbol{I}) = \prod_{i\in\Delta_1}(1-p_0(\boldsymbol{x}_i,\boldsymbol\theta))f_U(y_i;\boldsymbol\theta,\boldsymbol{x}_i)\\
\prod_{i\in\Delta_0}p_0(\boldsymbol{x}_i,\boldsymbol\theta)^{1-I_i}\{(1-p_0(\boldsymbol{x}_i,\boldsymbol\theta))S_U(y_i;\boldsymbol\theta,\boldsymbol{x}_i)\}^{I_i}.
f_U
and S_U
denote the probability density and survival function of the susceptibles, respectively, that is
S_U(y_i;\boldsymbol\theta,{\boldsymbol x}_i)=\frac{S_P(y_i;\boldsymbol{\theta},{\boldsymbol x}_i)-p_0({\boldsymbol x}_i;\boldsymbol\theta)}{1-p_0({\boldsymbol x}_i;\boldsymbol\theta)}, f_U(y_i;\boldsymbol\theta,{\boldsymbol x}_i)=\frac{f_P(y_i;\boldsymbol\theta,{\boldsymbol x}_i)}{1-p_0({\boldsymbol x}_i;\boldsymbol\theta)}.
Index of help topics:
bayesCureRateModel-package Bayesian Cure Rate Modeling for Time-to-Event Data complete_log_likelihood_general Logarithm of the complete log-likelihood for the general cure rate model. cure_rate_MC3 Main function of the package cure_rate_mcmc The basic MCMC scheme. log_dagum PDF and CDF of the Dagum distribution log_gamma PDF and CDF of the Gamma distribution log_gamma_mixture PDF and CDF of a Gamma mixture distribution log_gompertz PDF and CDF of the Gompertz distribution log_logLogistic PDF and CDF of the log-Logistic distribution. log_lomax PDF and CDF of the Lomax distribution log_weibull PDF and CDF of the Weibull distribution marriage_dataset Marriage data plot.bayesCureModel Plot method print.bayesCureModel Print method summary.bayesCureModel Summary method.
Author(s)
Panagiotis Papastamoulis and Fotios S. Milienos
Maintainer: Panagiotis Papastamoulis <papapast@yahoo.gr>
References
Papastamoulis and Milienos (2023). Bayesian inference and cure rate modeling for event history data. arXiv:2310.06926
See Also
Examples
# TOY EXAMPLE (very small numbers... only for CRAN check purposes)
# simulate toy data
set.seed(10)
n = 4
stat = rbinom(n, size = 1, prob = 0.5)
x <- cbind(1, matrix(rnorm(n), n, 1))
y <- rexp(n)
# run a weibull model with default prior setup
# considering 2 heated chains
fit1 <- cure_rate_MC3(y = y, X = x, Censoring_status = stat,
promotion_time = list(distribution = 'weibull'),
nChains = 2,
nCores = 1,
mcmc_cycles = 3, sweep=2)
# print method
fit1
# summary method
summary1 <- summary(fit1)
# WARNING: the following parameters
# mcmc_cycles, nChains
# should take _larger_ values. E.g. a typical implementation consists of:
# mcmc_cycles = 15000, nChains = 12
# run a Gamma mixture model with K = 2 components and default prior setup
fit2 <- cure_rate_MC3(y = y, X = x, Censoring_status = stat,
promotion_time = list(
distribution = 'gamma_mixture',
K = 2),
nChains = 8, nCores = 2,
mcmc_cycles = 10)
summary2 <- summary(fit2)