R: The function to implement Bayesian parametric and...

BayesID_AFT {SemiCompRisks}

R Documentation

The function to implement Bayesian parametric and semi-parametric analyses for semi-competing risks data in the context of accelerated failure time (AFT) models.

Description

Independent semi-competing risks data can be analyzed using AFT models that have a hierarchical structure. The proposed models can accomodate left-truncated and/or interval-censored data. An efficient computational algorithm that gives users the flexibility to adopt either a fully parametric (log-Normal) or a semi-parametric (Dirichlet process mixture) model specification is developed.

Usage

BayesID_AFT(Formula, data, model = "LN", hyperParams, startValues,
mcmcParams, na.action = "na.fail", subset=NULL, path=NULL)

Arguments

`Formula`	a `Formula` object, with the outcomes on the left of a `\sim`, and covariates on the right. It is of the form, left truncation time \| interval- (or right-) censored time for non-terminal event \| interval-(or right-) censored time for terminal event `\sim` covariates for `h_1` \| covariates for `h_2` \| covariates for `h_3`: i.e., `L` \| `y_{1L}`+`y_{1U}` \| `y_{2L}`+`y_{2U}` ~ `x_1` \| `x_2` \| `x_3`.
`data`	a data.frame in which to interpret the variables named in `Formula`.
`model`	The specification of baseline survival distribution: "LN" or "DPM".
`hyperParams`	a list containing lists or vectors for hyperparameter values in hierarchical models. Components include, `theta` (a numeric vector for hyperparameter in the prior of subject-specific frailty variance component), `LN` (a list containing numeric vectors for log-Normal hyperparameters: `LN.ab1`, `LN.ab2`, `LN.ab3`), `DPM` (a list containing numeric vectors for DPM hyperparameters: `DPM.mu1`, `DPM.mu2`, `DPM.mu3`, `DPM.sigSq1`, `DPM.sigSq2`, `DPM.sigSq3`, `DPM.ab1`, `DPM.ab2`, `DPM.ab3`, `Tau.ab1`, `Tau.ab2`, `Tau.ab3`). See Details and Examples below.
`startValues`	a list containing vectors of starting values for model parameters. It can be specified as the object returned by the function `initiate.startValues_AFT`.
`mcmcParams`	a list containing variables required for MCMC sampling. Components include, `run` (a list containing numeric values for setting for the overall run: `numReps`, total number of scans; `thin`, extent of thinning; `burninPerc`, the proportion of burn-in). `storage` (a list containing numeric values for storing posterior samples for subject- and cluster-specific random effects: `nGam_save`, the number of `\gamma` to be stored; `nY1_save`, the number of `y1` to be stored; `nY2_save`, the number of `y2` to be stored; `nY1.NA_save`, the number of `y1.NA` to be stored). `tuning` (a list containing numeric values relevant to tuning parameters for specific updates in Metropolis-Hastings (MH) algorithm: `betag.prop.var`, the variance of proposal density for `\beta_g`; `mug.prop.var`, the variance of proposal density for `\mu_{g}`; `zetag.prop.var`, the variance of proposal density for `1/\sigma_g^2`; `gamma.prop.var`, the variance of proposal density for `\gamma`). See Details and Examples below.
`na.action`	how NAs are treated. See `model.frame`.
`subset`	a specification of the rows to be used: defaults to all rows. See `model.frame`.
`path`	the name of directory where the results are saved.

Details

We view the semi-competing risks data as arising from an underlying illness-death model system in which individuals may undergo one or more of three transitions: 1) from some initial condition to non-terminal event, 2) from some initial condition to terminal event, 3) from non-terminal event to terminal event. Let T_{i1}, T_{i2} denote time to non-terminal and terminal event from subject i=1,...,n. We propose to directly model the times of the events via the following AFT model specification:

\log(T_{i1}) = x_{i1}^\top\beta_1 + \gamma_i + \epsilon_{i1}, T_{i1} > 0,

\log(T_{i2}) = x_{i2}^\top\beta_2 + \gamma_i + \epsilon_{i2}, T_{i2} > 0,

\log(T_{i2} - T_{i1}) = x_{i3}^\top\beta_3 + \gamma_i + \epsilon_{i3}, T_{i2} > T_{i1},

where x_{ig} is a vector of transition-specific covariates, \beta_g is a corresponding vector of transition-specific regression parameters and \epsilon_{ig} is a transition-specific random variable whose distribution determines that of the corresponding transition time, g \in \{1,2,3\}. \gamma_i is a study participant-specific random effect that induces positive dependence between the two event times, thereby performing a role analogous to that performed by frailties in models for the hazard function. Let L_{i} denote the time at study entry (i.e. the left-truncation time). Furthermore, suppose that study participant i was observed at follow-up times \{c_{i1},\ldots, c_{im_i}\} and let c_i^* denote the time to the end of study or to administrative right-censoring. Considering interval-censoring for both events, the times to non-terminal and terminal event for the i^{th} study participant satisfy c_{ij}\leq T_{i1}< c_{ij+1} for some j and c_{ik}\leq T_{i2}< c_{ik+1} for some k, respectively. Then the observed outcomes for the i^{th} study participant can be succinctly denoted by \{c_{ij}, c_{ij+1}, c_{ik}, c_{ik+1}, L_{i}\}.

For the Bayesian semi-parametric analysis, we proceed by adopting independent DPM of normal distributions for each \epsilon_{ig}. More precisely, \epsilon_{ig} is taken to be an independent draw from a mixture of M_g normal distributions with means and variances (\mu_{gr}, \sigma_{gr}^2), for r \in \{1,\ldots,M_g\}. Since the class-specific (\mu_{gr}, \sigma_{gr}^2) are not known, they are taken to be draws from some common distribution, G_{g0}, often referred to as the centering distribution. Furthermore, since the ‘true’ class membership for any given study participant is not known, we let p_{gr} denote the probability of belonging to the r^{th} class for transition g and p_g = (p_{g1}, \ldots, p_{gM_g}) the collection of such probabilities. Note, p_g is defined at the level of the population (i.e. is not study participant-specific) and its components add up to 1.0. In the absence of prior knowledge regarding the distribution of class memberships for the n individuals across the M_g classes, p_g is assumed to follow a conjugate symmetric Dirichlet(\tau_g/M_g,\ldots,\tau_g/M_g) distribution, where \tau_g is referred to as the precision parameter. The finite mixture distribution can then be succinctly represented as:

\epsilon_{ig} | r_{i} \sim Normal(\mu_{r_{i}}, \sigma_{r_{i}}^2),

(\mu_{gr}, \sigma_{gr}^2) \sim G_{g0}, ~~for~ r=1,\ldots,M_g,

r_{i}| p_g \sim Discrete(r_{i} | p_{g1},\ldots,p_{gM_g}),

p_g \sim Dirichlet(\tau_g/M_g, \ldots, \tau_g/M_g).

Letting M_g approach infinity, this specification is referred to as a DPM of normal distributions. In our proposed framework, we specify a Gamma(a_{\tau_g}, b_{\tau_g}) hyperprior for \tau_g. For regression parameters, we adopt non-informative flat priors on the real line. For \gamma=\{\gamma_1, \ldots, \gamma_n\}, we assume that each \gamma_i is an independent random draw from a Normal(0, \theta) distribution. In the absence of prior knowledge on the variance component \theta, we adopt a conjugate inverse-Gamma hyperprior, IG(a_\theta, b_\theta). Finally, We take the G_{g0} as a normal distribution centered at \mu_{g0} with a variance \sigma_{g0}^2 for \mu_{gr} and an IG(a_{\sigma_g}, b_{\sigma_g}) for \sigma_{gr}^2.

For the Bayesian parametric analysis, we build on the log-Normal formulation and take the \epsilon_{ig} to follow independent Normal(\mu_g, \sigma_g^2) distributions, g=1,2,3. For location parameters \{\mu_1, \mu_2, \mu_3\}, we adopt non-informative flat priors on the real line. For \{\sigma_1^2, \sigma_2^2, \sigma_3^2\}, we adopt independent inverse Gamma distributions, denoted IG(a_{\sigma g}, b_{\sigma g}). For \beta_g, \gamma, and \theta, we adopt the same priors as those adopted for the DPM model.

Value

BayesID_AFT returns an object of class Bayes_AFT.

Note

The posterior samples of \gamma are saved separately in working directory/path. For a dataset with large n, nGam_save should be carefully specified considering the system memory and the storage capacity.

Author(s)

Kyu Ha Lee and Sebastien Haneuse
Maintainer: Kyu Ha Lee <klee15239@gmail.com>

References

Lee, K. H., Rondeau, V., and Haneuse, S. (2017), Accelerated failure time models for semicompeting risks data in the presence of complex censoring, Biometrics, 73, 4, 1401-1412.

Alvares, D., Haneuse, S., Lee, C., Lee, K. H. (2019), SemiCompRisks: An R package for the analysis of independent and cluster-correlated semi-competing risks data, The R Journal, 11, 1, 376-400.

Examples


## Not run: 
# loading a data set
data(scrData)
scrData$y1L <- scrData$y1U <- scrData[,1]
scrData$y1U[which(scrData[,2] == 0)] <- Inf
scrData$y2L <- scrData$y2U <- scrData[,3]
scrData$y2U[which(scrData[,4] == 0)] <- Inf
scrData$LT <- rep(0, dim(scrData)[1])

form <- Formula(LT | y1L + y1U | y2L + y2U  ~ x1 + x2 + x3 | x1 + x2 | x1 + x2)

#####################
## Hyperparameters ##
#####################

## Subject-specific random effects variance component
##
theta.ab <- c(0.5, 0.05)

## log-Normal model
##
LN.ab1 <- c(0.3, 0.3)
LN.ab2 <- c(0.3, 0.3)
LN.ab3 <- c(0.3, 0.3)

## DPM model
##
DPM.mu1 <- log(12)
DPM.mu2 <- log(12)
DPM.mu3 <- log(12)

DPM.sigSq1 <- 100
DPM.sigSq2 <- 100
DPM.sigSq3 <- 100

DPM.ab1 <-  c(2, 1)
DPM.ab2 <-  c(2, 1)
DPM.ab3 <-  c(2, 1)

Tau.ab1 <- c(1.5, 0.0125)
Tau.ab2 <- c(1.5, 0.0125)
Tau.ab3 <- c(1.5, 0.0125)

##
hyperParams <- list(theta=theta.ab,
LN=list(LN.ab1=LN.ab1, LN.ab2=LN.ab2, LN.ab3=LN.ab3),
DPM=list(DPM.mu1=DPM.mu1, DPM.mu2=DPM.mu2, DPM.mu3=DPM.mu3, DPM.sigSq1=DPM.sigSq1,
DPM.sigSq2=DPM.sigSq2, DPM.sigSq3=DPM.sigSq3, DPM.ab1=DPM.ab1, DPM.ab2=DPM.ab2,
DPM.ab3=DPM.ab3, Tau.ab1=Tau.ab1, Tau.ab2=Tau.ab2, Tau.ab3=Tau.ab3))

###################
## MCMC SETTINGS ##
###################

## Setting for the overall run
##
numReps    <- 300
thin       <- 3
burninPerc <- 0.5

## Setting for storage
##
nGam_save <- 10
nY1_save <- 10
nY2_save <- 10
nY1.NA_save <- 10

## Tuning parameters for specific updates
##
##  - those common to all models
betag.prop.var	<- c(0.01,0.01,0.01)
mug.prop.var	<- c(0.1,0.1,0.1)
zetag.prop.var	<- c(0.1,0.1,0.1)
gamma.prop.var	<- 0.01

##
mcmcParams	<- list(run=list(numReps=numReps, thin=thin, burninPerc=burninPerc),
storage=list(nGam_save=nGam_save, nY1_save=nY1_save, nY2_save=nY2_save, nY1.NA_save=nY1.NA_save),
tuning=list(betag.prop.var=betag.prop.var, mug.prop.var=mug.prop.var,
zetag.prop.var=zetag.prop.var, gamma.prop.var=gamma.prop.var))

#################################################################
## Analysis of Independent Semi-competing risks data ############
#################################################################

###############
## logNormal ##
###############

##
myModel <- "LN"
myPath  <- "Output/01-Results-LN/"

startValues      <- initiate.startValues_AFT(form, scrData, model=myModel, nChain=2)

##
fit_LN <- BayesID_AFT(form, scrData, model=myModel, hyperParams,
startValues, mcmcParams, path=myPath)

fit_LN
summ.fit_LN <- summary(fit_LN); names(summ.fit_LN)
summ.fit_LN
pred_LN <- predict(fit_LN, time = seq(0, 35, 1), tseq=seq(from=0, to=30, by=5))
plot(pred_LN, plot.est="Haz")
plot(pred_LN, plot.est="Surv")

#########
## DPM ##
#########

##
myModel <- "DPM"
myPath  <- "Output/02-Results-DPM/"

startValues      <- initiate.startValues_AFT(form, scrData, model=myModel, nChain=2)

##
fit_DPM <- BayesID_AFT(form, scrData, model=myModel, hyperParams,
startValues, mcmcParams, path=myPath)

fit_DPM
summ.fit_DPM <- summary(fit_DPM); names(summ.fit_DPM)
summ.fit_DPM
pred_DPM <- predict(fit_DPM, time = seq(0, 35, 1), tseq=seq(from=0, to=30, by=5))
plot(pred_DPM, plot.est="Haz")
plot(pred_DPM, plot.est="Surv")

## End(Not run)

[Package SemiCompRisks version 3.4 Index]