| bayesCureRateModel-package {bayesCureRateModel} | R Documentation |
Bayesian Cure Rate Modeling for Time-to-Event Data
Description
A fully Bayesian approach in order to estimate a general family of cure rate models under the presence of covariates, see Papastamoulis and Milienos (2023) <doi:10.48550/arXiv.2310.06926>. The promotion time can be modelled (a) parametrically using typical distributional assumptions for time to event data (including the Weibull, Exponential, Gompertz, log-Logistic distributions), or (b) semiparametrically using finite mixtures of Gamma distributions. Posterior inference is carried out by constructing a Metropolis-coupled Markov chain Monte Carlo (MCMC) sampler, which combines Gibbs sampling for the latent cure indicators and Metropolis-Hastings steps with Langevin diffusion dynamics for parameter updates. The main MCMC algorithm is embedded within a parallel tempering scheme by considering heated versions of the target posterior distribution.
The main function of the package is cure_rate_MC3. See details for a brief description of the model.
Details
Let \boldsymbol{y} = (y_1,\ldots,y_n) denote the observed data, which correspond to time-to-event data or censoring times. Let also \boldsymbol{x}_i = (x_{i1},\ldots,x_{x_{ip}})' denote the covariates for subject i, i=1,\ldots,n.
Assuming that the n observations are independent, the observed likelihood is defined as
L=L({\boldsymbol \theta}; {\boldsymbol y}, {\boldsymbol x})=\prod_{i=1}^{n}f_P(y_i;{\boldsymbol\theta},{\boldsymbol x}_i)^{\delta_i}S_P(y_i;{\boldsymbol \theta},{\boldsymbol x}_i)^{1-\delta_i},
where \delta_i=1 if the i-th observation corresponds to time-to-event while \delta_i=0 indicates censoring time. The parameter vector \boldsymbol\theta is decomposed as
\boldsymbol\theta = (\boldsymbol\alpha', \boldsymbol\beta', \gamma,\lambda)
where
-
\boldsymbol\alpha = (\alpha_1,\ldots,\alpha_d)'\in\mathcal Aare the parameters of the promotion time distribution whose cumulative distribution and density functions are denoted asF(\cdot,\boldsymbol\alpha)andf(\cdot,\boldsymbol\alpha), respectively. -
\boldsymbol\beta\in\mathbf R^{k}are the regression coefficients withkdenoting the number of columns in the design matrix (it may include a constant term or not). -
\gamma\in\mathbf R -
\lambda > 0.
The population survival and density functions are defined as
S_P(y;\boldsymbol\theta) = \left(1 + \gamma\exp\{\boldsymbol{x}_i\boldsymbol{\beta}'\}c^{\gamma\exp\{\boldsymbol{x}_i\boldsymbol{\beta}'\}}F(y;\boldsymbol\alpha)^\lambda\right)^{-1/\gamma}
whereas,
f_P(y;\boldsymbol\theta)=-\frac{\partial S_P(y;\boldsymbol\theta)}{\partial y}.
Finally, the cure rate is affected through covariates and parameters as follows
p_0(\boldsymbol{x}_i;\boldsymbol{\theta}) = \left(1 + \gamma\exp\{\boldsymbol{x}_i\boldsymbol{\beta}'\}c^{\gamma\exp\{\boldsymbol{x}_i\boldsymbol{\beta}'\}}\right)^{-1/\gamma}
where c = e^{e^{-1}}.
The promotion time distribution can be a member of standard families (currently available are the following: Exponential, Weibull, Gamma, Lomax, Gompertz, log-Logistic) and in this case \alpha = (\alpha_1,\alpha_2)\in (0,\infty)^2. Also considered is the Dagum distribution, which has three parameters (\alpha_1,\alpha_2,\alpha_3)\in(0,\infty)^3. In case that the previous parametric assumptions are not justified, the promotion time can belong to the more flexible family of finite mixtures of Gamma distributions. For example, assume a mixture of two Gamma distributions of the form
f(y;\boldsymbol \alpha) = \alpha_5 f_{\mathcal G}(y;\alpha_1,\alpha_3) + (1-\alpha_5) f_{\mathcal G}(y;\alpha_2,\alpha_4),
where
f_\mathcal{G}(y;\alpha,\beta)=\frac{\beta^{\alpha}}{\Gamma(\alpha)}y^{\alpha-1}\exp\{-\beta y\}, y>0
denotes the density of the Gamma distribution with parameters \alpha > 0 (shape) and \beta > 0 (rate).
For the previous model, the parameter vector is
\boldsymbol\alpha = (\alpha_1,\alpha_2,\alpha_3,\alpha_4,\alpha_5)'\in\mathcal A
where \mathcal A = (0,\infty)^4\times (0,1).
More generally, one can fit a mixture of K>2 Gamma distributions. The appropriate model can be selected according to information criteria such as the BIC.
The binary vector \boldsymbol{I} = (I_1,\ldots,I_n) contains the (latent) cure indicators, that is, I_i = 1 if the i-th subject is susceptible and I_i = 0 if the i-th subject is cured. \Delta_0 denotes the subset of \{1,\ldots,n\} containing the censored subjects, whereas \Delta_1 = \Delta_0^c is the (complementary) subset of uncensored subjects. The complete likelihood of the model is
L_c(\boldsymbol{\theta};\boldsymbol{y}, \boldsymbol{I}) = \prod_{i\in\Delta_1}(1-p_0(\boldsymbol{x}_i,\boldsymbol\theta))f_U(y_i;\boldsymbol\theta,\boldsymbol{x}_i)\\
\prod_{i\in\Delta_0}p_0(\boldsymbol{x}_i,\boldsymbol\theta)^{1-I_i}\{(1-p_0(\boldsymbol{x}_i,\boldsymbol\theta))S_U(y_i;\boldsymbol\theta,\boldsymbol{x}_i)\}^{I_i}.
f_U and S_U denote the probability density and survival function of the susceptibles, respectively, that is
S_U(y_i;\boldsymbol\theta,{\boldsymbol x}_i)=\frac{S_P(y_i;\boldsymbol{\theta},{\boldsymbol x}_i)-p_0({\boldsymbol x}_i;\boldsymbol\theta)}{1-p_0({\boldsymbol x}_i;\boldsymbol\theta)}, f_U(y_i;\boldsymbol\theta,{\boldsymbol x}_i)=\frac{f_P(y_i;\boldsymbol\theta,{\boldsymbol x}_i)}{1-p_0({\boldsymbol x}_i;\boldsymbol\theta)}.
Index of help topics:
bayesCureRateModel-package
Bayesian Cure Rate Modeling for Time-to-Event
Data
complete_log_likelihood_general
Logarithm of the complete log-likelihood for
the general cure rate model.
cure_rate_MC3 Main function of the package
cure_rate_mcmc The basic MCMC scheme.
log_dagum PDF and CDF of the Dagum distribution
log_gamma PDF and CDF of the Gamma distribution
log_gamma_mixture PDF and CDF of a Gamma mixture distribution
log_gompertz PDF and CDF of the Gompertz distribution
log_logLogistic PDF and CDF of the log-Logistic distribution.
log_lomax PDF and CDF of the Lomax distribution
log_weibull PDF and CDF of the Weibull distribution
marriage_dataset Marriage data
plot.bayesCureModel Plot method
print.bayesCureModel Print method
summary.bayesCureModel
Summary method.
Author(s)
Panagiotis Papastamoulis and Fotios S. Milienos
Maintainer: Panagiotis Papastamoulis <papapast@yahoo.gr>
References
Papastamoulis and Milienos (2023). Bayesian inference and cure rate modeling for event history data. arXiv:2310.06926
See Also
Examples
# TOY EXAMPLE (very small numbers... only for CRAN check purposes)
# simulate toy data
set.seed(10)
n = 4
stat = rbinom(n, size = 1, prob = 0.5)
x <- cbind(1, matrix(rnorm(n), n, 1))
y <- rexp(n)
# run a weibull model with default prior setup
# considering 2 heated chains
fit1 <- cure_rate_MC3(y = y, X = x, Censoring_status = stat,
promotion_time = list(distribution = 'weibull'),
nChains = 2,
nCores = 1,
mcmc_cycles = 3, sweep=2)
# print method
fit1
# summary method
summary1 <- summary(fit1)
# WARNING: the following parameters
# mcmc_cycles, nChains
# should take _larger_ values. E.g. a typical implementation consists of:
# mcmc_cycles = 15000, nChains = 12
# run a Gamma mixture model with K = 2 components and default prior setup
fit2 <- cure_rate_MC3(y = y, X = x, Censoring_status = stat,
promotion_time = list(
distribution = 'gamma_mixture',
K = 2),
nChains = 8, nCores = 2,
mcmc_cycles = 10)
summary2 <- summary(fit2)