BayesID_AFT {SemiCompRisks} | R Documentation |
The function to implement Bayesian parametric and semi-parametric analyses for semi-competing risks data in the context of accelerated failure time (AFT) models.
Description
Independent semi-competing risks data can be analyzed using AFT models that have a hierarchical structure. The proposed models can accomodate left-truncated and/or interval-censored data. An efficient computational algorithm that gives users the flexibility to adopt either a fully parametric (log-Normal) or a semi-parametric (Dirichlet process mixture) model specification is developed.
Usage
BayesID_AFT(Formula, data, model = "LN", hyperParams, startValues,
mcmcParams, na.action = "na.fail", subset=NULL, path=NULL)
Arguments
Formula |
a |
data |
a data.frame in which to interpret the variables named in |
model |
The specification of baseline survival distribution: "LN" or "DPM". |
hyperParams |
a list containing lists or vectors for hyperparameter values in hierarchical models. Components include,
|
startValues |
a list containing vectors of starting values for model parameters. It can be specified as the object returned by the function |
mcmcParams |
a list containing variables required for MCMC sampling. Components include,
|
na.action |
how NAs are treated. See |
subset |
a specification of the rows to be used: defaults to all rows. See |
path |
the name of directory where the results are saved. |
Details
We view the semi-competing risks data as arising from an underlying illness-death model system in which individuals may undergo one or more of three transitions: 1) from some initial condition to non-terminal event, 2) from some initial condition to terminal event, 3) from non-terminal event to terminal event. Let T_{i1}
, T_{i2}
denote time to non-terminal and terminal event from subject i=1,...,n
. We propose to directly model the times of the events via the following AFT model specification:
\log(T_{i1}) = x_{i1}^\top\beta_1 + \gamma_i + \epsilon_{i1}, T_{i1} > 0,
\log(T_{i2}) = x_{i2}^\top\beta_2 + \gamma_i + \epsilon_{i2}, T_{i2} > 0,
\log(T_{i2} - T_{i1}) = x_{i3}^\top\beta_3 + \gamma_i + \epsilon_{i3}, T_{i2} > T_{i1},
where x_{ig}
is a vector of transition-specific covariates, \beta_g
is a corresponding vector of transition-specific regression parameters and \epsilon_{ig}
is a transition-specific random variable whose distribution determines that of the corresponding transition time, g \in \{1,2,3\}
. \gamma_i
is a study participant-specific random effect that induces positive dependence between the two event times, thereby performing a role analogous to that performed by frailties in models for the hazard function.
Let L_{i}
denote the time at study entry (i.e. the left-truncation time). Furthermore, suppose that study participant i
was observed at follow-up times \{c_{i1},\ldots, c_{im_i}\}
and let c_i^*
denote the time to the end of study or to administrative right-censoring. Considering interval-censoring for both events, the times to non-terminal and terminal event for the i^{th}
study participant satisfy c_{ij}\leq T_{i1}< c_{ij+1}
for some j
and c_{ik}\leq T_{i2}< c_{ik+1}
for some k
, respectively. Then the observed outcomes for the i^{th}
study participant can be succinctly denoted by \{c_{ij}, c_{ij+1}, c_{ik}, c_{ik+1}, L_{i}\}
.
For the Bayesian semi-parametric analysis, we proceed by adopting independent DPM of normal distributions for each \epsilon_{ig}
. More precisely, \epsilon_{ig}
is taken to be an independent draw from a mixture of M_g
normal distributions with means and variances (\mu_{gr}
, \sigma_{gr}^2
), for r \in \{1,\ldots,M_g\}
. Since the class-specific (\mu_{gr}, \sigma_{gr}^2)
are not known, they are taken to be draws from some common distribution, G_{g0}
, often referred to as the centering distribution. Furthermore, since the ‘true’ class membership for any given study participant is not known, we let p_{gr}
denote the probability of belonging to the r^{th}
class for transition g
and p_g
= (p_{g1}, \ldots, p_{gM_g})
the collection of such probabilities. Note, p_g
is defined at the level of the population (i.e. is not study participant-specific) and its components add up to 1.0. In the absence of prior knowledge regarding the distribution of class memberships for the n
individuals across the M_g
classes, p_g
is assumed to follow a conjugate symmetric Dirichlet(\tau_g/M_g,\ldots,\tau_g/M_g)
distribution, where \tau_g
is referred to as the precision parameter. The finite mixture distribution can then be succinctly represented as:
\epsilon_{ig} | r_{i} \sim Normal(\mu_{r_{i}}, \sigma_{r_{i}}^2),
(\mu_{gr}, \sigma_{gr}^2) \sim G_{g0}, ~~for~ r=1,\ldots,M_g,
r_{i}| p_g \sim Discrete(r_{i} | p_{g1},\ldots,p_{gM_g}),
p_g \sim Dirichlet(\tau_g/M_g, \ldots, \tau_g/M_g).
Letting M_g
approach infinity, this specification is referred to as a DPM of normal distributions. In our proposed framework, we specify a Gamma(a_{\tau_g}
, b_{\tau_g}
) hyperprior for \tau_g
. For regression parameters, we adopt non-informative flat priors on the real line. For \gamma
=\{\gamma_1, \ldots, \gamma_n\}
, we assume that each \gamma_i
is an independent random draw from a Normal(0, \theta
) distribution. In the absence of prior knowledge on the variance component \theta
, we adopt a conjugate inverse-Gamma hyperprior, IG(a_\theta
, b_\theta
). Finally, We take the G_{g0}
as a normal distribution centered at \mu_{g0}
with a variance \sigma_{g0}^2
for \mu_{gr}
and an IG(a_{\sigma_g}
, b_{\sigma_g}
) for \sigma_{gr}^2
.
For the Bayesian parametric analysis, we build on the log-Normal formulation and take the \epsilon_{ig}
to follow independent Normal(\mu_g
, \sigma_g^2
) distributions, g
=1,2,3. For location parameters \{\mu_1, \mu_2, \mu_3\}
, we adopt non-informative flat priors on the real line. For \{\sigma_1^2, \sigma_2^2, \sigma_3^2\}
, we adopt independent inverse Gamma distributions, denoted IG(a_{\sigma g}
, b_{\sigma g}
). For \beta_g
, \gamma
, and \theta
, we adopt the same priors as those adopted for the DPM model.
Value
BayesID_AFT
returns an object of class Bayes_AFT
.
Note
The posterior samples of \gamma
are saved separately in working directory/path
.
For a dataset with large n
, nGam_save
should be carefully specified considering the system memory and the storage capacity.
Author(s)
Kyu Ha Lee and Sebastien Haneuse
Maintainer: Kyu Ha Lee <klee15239@gmail.com>
References
Lee, K. H., Rondeau, V., and Haneuse, S. (2017),
Accelerated failure time models for semicompeting risks data in the presence of complex censoring, Biometrics, 73, 4, 1401-1412.
Alvares, D., Haneuse, S., Lee, C., Lee, K. H. (2019),
SemiCompRisks: An R package for the analysis of independent and cluster-correlated semi-competing risks data, The R Journal, 11, 1, 376-400.
See Also
initiate.startValues_AFT
, print.Bayes_AFT
, summary.Bayes_AFT
, predict.Bayes_AFT
Examples
## Not run:
# loading a data set
data(scrData)
scrData$y1L <- scrData$y1U <- scrData[,1]
scrData$y1U[which(scrData[,2] == 0)] <- Inf
scrData$y2L <- scrData$y2U <- scrData[,3]
scrData$y2U[which(scrData[,4] == 0)] <- Inf
scrData$LT <- rep(0, dim(scrData)[1])
form <- Formula(LT | y1L + y1U | y2L + y2U ~ x1 + x2 + x3 | x1 + x2 | x1 + x2)
#####################
## Hyperparameters ##
#####################
## Subject-specific random effects variance component
##
theta.ab <- c(0.5, 0.05)
## log-Normal model
##
LN.ab1 <- c(0.3, 0.3)
LN.ab2 <- c(0.3, 0.3)
LN.ab3 <- c(0.3, 0.3)
## DPM model
##
DPM.mu1 <- log(12)
DPM.mu2 <- log(12)
DPM.mu3 <- log(12)
DPM.sigSq1 <- 100
DPM.sigSq2 <- 100
DPM.sigSq3 <- 100
DPM.ab1 <- c(2, 1)
DPM.ab2 <- c(2, 1)
DPM.ab3 <- c(2, 1)
Tau.ab1 <- c(1.5, 0.0125)
Tau.ab2 <- c(1.5, 0.0125)
Tau.ab3 <- c(1.5, 0.0125)
##
hyperParams <- list(theta=theta.ab,
LN=list(LN.ab1=LN.ab1, LN.ab2=LN.ab2, LN.ab3=LN.ab3),
DPM=list(DPM.mu1=DPM.mu1, DPM.mu2=DPM.mu2, DPM.mu3=DPM.mu3, DPM.sigSq1=DPM.sigSq1,
DPM.sigSq2=DPM.sigSq2, DPM.sigSq3=DPM.sigSq3, DPM.ab1=DPM.ab1, DPM.ab2=DPM.ab2,
DPM.ab3=DPM.ab3, Tau.ab1=Tau.ab1, Tau.ab2=Tau.ab2, Tau.ab3=Tau.ab3))
###################
## MCMC SETTINGS ##
###################
## Setting for the overall run
##
numReps <- 300
thin <- 3
burninPerc <- 0.5
## Setting for storage
##
nGam_save <- 10
nY1_save <- 10
nY2_save <- 10
nY1.NA_save <- 10
## Tuning parameters for specific updates
##
## - those common to all models
betag.prop.var <- c(0.01,0.01,0.01)
mug.prop.var <- c(0.1,0.1,0.1)
zetag.prop.var <- c(0.1,0.1,0.1)
gamma.prop.var <- 0.01
##
mcmcParams <- list(run=list(numReps=numReps, thin=thin, burninPerc=burninPerc),
storage=list(nGam_save=nGam_save, nY1_save=nY1_save, nY2_save=nY2_save, nY1.NA_save=nY1.NA_save),
tuning=list(betag.prop.var=betag.prop.var, mug.prop.var=mug.prop.var,
zetag.prop.var=zetag.prop.var, gamma.prop.var=gamma.prop.var))
#################################################################
## Analysis of Independent Semi-competing risks data ############
#################################################################
###############
## logNormal ##
###############
##
myModel <- "LN"
myPath <- "Output/01-Results-LN/"
startValues <- initiate.startValues_AFT(form, scrData, model=myModel, nChain=2)
##
fit_LN <- BayesID_AFT(form, scrData, model=myModel, hyperParams,
startValues, mcmcParams, path=myPath)
fit_LN
summ.fit_LN <- summary(fit_LN); names(summ.fit_LN)
summ.fit_LN
pred_LN <- predict(fit_LN, time = seq(0, 35, 1), tseq=seq(from=0, to=30, by=5))
plot(pred_LN, plot.est="Haz")
plot(pred_LN, plot.est="Surv")
#########
## DPM ##
#########
##
myModel <- "DPM"
myPath <- "Output/02-Results-DPM/"
startValues <- initiate.startValues_AFT(form, scrData, model=myModel, nChain=2)
##
fit_DPM <- BayesID_AFT(form, scrData, model=myModel, hyperParams,
startValues, mcmcParams, path=myPath)
fit_DPM
summ.fit_DPM <- summary(fit_DPM); names(summ.fit_DPM)
summ.fit_DPM
pred_DPM <- predict(fit_DPM, time = seq(0, 35, 1), tseq=seq(from=0, to=30, by=5))
plot(pred_DPM, plot.est="Haz")
plot(pred_DPM, plot.est="Surv")
## End(Not run)