BayesSurv_HReg {SemiCompRisks}R Documentation

The function to implement Bayesian parametric and semi-parametric regression analyses for univariate time-to-event data in the context of hazard regression (HReg) models.

Description

Independent/cluster-correlated univariate right-censored survival data can be analyzed using hierarchical models. The prior for the baseline hazard function can be specified by either parametric (Weibull) model or non-parametric mixture of piecewise exponential models (PEM).

Usage

BayesSurv_HReg(Formula, data, id=NULL, model="Weibull", hyperParams,
        startValues, mcmcParams, na.action = "na.fail", subset=NULL, path=NULL)

Arguments

Formula

a Formula object, with the outcome on the left of a \sim, and covariates on the right. It is of the form, time to event + censoring indicator \sim covariates: i.e., yy+δ\delta ~ xx.

data

a data.frame in which to interpret the variables named in Formula.

id

a vector of cluster information for n subjects. The cluster membership must be consecutive positive integers, 1:J1:J.

model

a character vector that specifies the type of components in a model. The first element is for the specification of baseline hazard functions: "Weibull" or "PEM". The second element needs to be set only for clustered data and is for the specification of cluster-specific random effects distribution: "Normal" or "DPM".

hyperParams

a list containing lists or vectors for hyperparameter values in hierarchical models. Components include, WB (a list containing a numeric vector for Weibull hyperparameters: WB.ab, WB.cd), PEM (a list containing numeric vectors for PEM hyperparameters: PEM.ab, PEM.alpha). Models for clustered data require additional components, Normal (a list containing a numeric vector for hyperparameters in a Normal prior: Normal.ab), DPM (a list containing numeric vectors for DPM hyperparameters: DPM.ab, aTau, bTau). See Details and Examples below.

startValues

a list containing vectors of starting values for model parameters. It can be specified as the object returned by the function initiate.startValues_HReg.

mcmcParams

a list containing variables required for MCMC sampling. Components include, run (a list containing numeric values for setting for the overall run: numReps, total number of scans; thin, extent of thinning; burninPerc, the proportion of burn-in). storage (a list containing numeric values for storing posterior samples for cluster-specific random effects: storeV, a logical value to determine whether all the posterior samples of VV are to be stored.) tuning (a list containing numeric values relevant to tuning parameters for specific updates in Metropolis-Hastings-Green (MHG) algorithm: mhProp_V_var, the variance of proposal density for VV in DPM models; mhProp_alpha_var, the variance of proposal density for α\alpha in Weibull models; C, a numeric value for the proportion that determines the sum of probabilities of choosing the birth and the death moves in PEM models. The value should not exceed 0.8; delPert, the perturbation parameter in the birth update in PEM models. The values must be between 0 and 0.5; If rj.scheme=1, the birth update will draw the proposal time split from 1:smax1:s_{max}. If rj.scheme=2, the birth update will draw the proposal time split from uniquely ordered failure times in the data. Only required for PEM models; K_max, the maximum number of splits allowed at each iteration in MHG algorithm for PEM models; time_lambda - time points at which the log-hazard function is calculated for predict.Bayes_HReg, Only required for PEM models). See Details and Examples below.

na.action

how NAs are treated. See model.frame.

subset

a specification of the rows to be used: defaults to all rows. See model.frame.

path

the name of directory where the results are saved.

Details

The function BayesSurv_HReg implements Bayesian semi-parametric (piecewise exponential mixture) and parametric (Weibull) models to univariate time-to-event data. Let tjit_{ji} denote time to event of interest from subject i=1,...,nji=1,...,n_j in cluster j=1,...,Jj=1,...,J. The covariates xjix_{ji} are incorporated via Cox proportional hazards model:

h(tjixji)=h0(tji)exp(xjiβ+Vj),tji>0,h(t_{ji} | x_{ji}) = h_{0}(t_{ji})\exp(x_{ji}^{\top}\beta + V_{j}), t_{ji}>0,

where h0h_0 is an unspecified baseline hazard function and β\beta is a vector of pp log-hazard ratio regression parameters. VjV_j's are cluster-specific random effects. For parametric Normal prior specification for a vector of cluster-specific random effects, we assume VV arise as i.i.d. draws from a mean 0 Normal distribution with variance σ2\sigma^2. Specifically, the priors can be written as follows:

VjNormal(0,σ2),V_j \sim Normal(0, \sigma^2),

ζ=1/σ2Gamma(aN,bN).\zeta=1/\sigma^2 \sim Gamma(a_{N}, b_{N}).

For DPM prior specification for VjV_j, we consider non-parametric Dirichlet process mixture of Normal distributions: the VjV_j's' are draws from a finite mixture of M Normal distributions, each with their own mean and variance, (μm\mu_m, σm2\sigma_m^2) for m=1,...,Mm=1,...,M. Let mj{1,...,M}m_j\in\{1,...,M\} denote the specific component to which the jjth cluster belongs. Since the class-specific (μm\mu_m, σm2\sigma_m^2) are not known they are taken to be draws from some distribution, G0G_0, often referred to as the centering distribution. Furthermore, since the true class memberships are unknown, we denote the probability that the jjth cluster belongs to any given class by the vector p=(p1,...,pM)p=(p_1,..., p_M) whose components add up to 1.0. In the absence of prior knowledge regarding the distribution of class memberships for the JJ clusters across the MM classes, a natural prior for pp is the conjugate symmetric Dirichlet(τ/M,...,τ/M)Dirichlet(\tau/M,...,\tau/M) distribution; the hyperparameter, τ\tau, is often referred to as a the precision parameter. The prior can be represented as follows (MM goes to infinity):

VjmjNormal(μmj,σmj2),V_j | m_j \sim Normal(\mu_{m_j}, \sigma_{m_j}^2),

(μm,σm2)G0,  for m=1,...,M,(\mu_m, \sigma_m^2) \sim G_{0},~~ for ~m=1,...,M,

mjpDiscrete(mjp1,...,pM),m_j | p \sim Discrete(m_j| p_1,...,p_M),

pDirichlet(τ/M,...,τ/M),p \sim Dirichlet(\tau/M,...,\tau/M),

where G0G_0 is taken to be a multivariate Normal/inverse-Gamma (NIG) distribution for which the probability density function is the following product:

fNIG(μ,σ2μ0,ζ0,a0,b0)=fNormal(μ0,1/ζ02)×fGamma(ζ=1/σ2a0,b0).f_{NIG}(\mu, \sigma^2 | \mu_0, \zeta_0, a_0, b_0) = f_{Normal}(\mu | 0, 1/\zeta_0^2) \times f_{Gamma}(\zeta=1/\sigma^2 | a_0, b_0).

In addition, we use Gamma(aτ,bτ)Gamma(a_{\tau}, b_{\tau}) as the hyperprior for τ\tau.

For non-parametric prior specification (PEM) for baseline hazard function, let smaxs_{\max} denote the largest observed event time. Then, consider the finite partition of the relevant time axis into K+1K + 1 disjoint intervals: 0<s1<s2<...<sK+1=smax0<s_1<s_2<...<s_{K+1} = s_{\max}. For notational convenience, let Ik=(sk1,sk]I_k=(s_{k-1}, s_k] denote the kthk^{th} partition. For given a partition, s=(s1,,sK+1)s = (s_1, \dots, s_{K + 1}), we assume the log-baseline hazard functions is piecewise constant:

λ0(t)=logh0(t)=k=1K+1λkI(tIk),\lambda_{0}(t)=\log h_{0}(t) = \sum_{k=1}^{K + 1} \lambda_{k} I(t\in I_{k}),

where I()I(\cdot) is the indicator function and s00s_0 \equiv 0. In our proposed Bayesian framework, our prior choices are:

π(β)1,\pi(\beta) \propto 1,

λK,μλ,σλ2MVNK+1(μλ1,σλ2Σλ),\lambda | K, \mu_{\lambda}, \sigma_{\lambda}^2 \sim MVN_{K+1}(\mu_{\lambda}1, \sigma_{\lambda}^2\Sigma_{\lambda}),

KPoisson(α),K \sim Poisson(\alpha),

π(sK)(2K+1)!k=1K+1(sksk1)(sK+1)(2K+1),\pi(s | K) \propto \frac{(2K+1)! \prod_{k=1}^{K+1}(s_k-s_{k-1})}{(s_{K+1})^{(2K+1)}},

π(μλ)1,\pi(\mu_{\lambda}) \propto 1,

σλ2Gamma(a,b).\sigma_{\lambda}^{-2} \sim Gamma(a, b).

Note that KK and ss are treated as random and the priors for KK and ss jointly form a time-homogeneous Poisson process prior for the partition. The number of time splits and their positions are therefore updated within our computational scheme using reversible jump MCMC.

For parametric Weibull prior specification for baseline hazard function, h0(t)=ακtα1h_{0}(t) = \alpha \kappa t^{\alpha-1}. In our Bayesian framework, our prior choices are:

π(β)1,\pi(\beta) \propto 1,

π(α)Gamma(a,b),\pi(\alpha) \sim Gamma(a, b),

π(κ)Gamma(c,d).\pi(\kappa) \sim Gamma(c, d).

We provide a detailed description of the hierarchical models for cluster-correlated univariate survival data. The models for independent data can be obtained by removing cluster-specific random effects, VjV_j, and its corresponding prior specification from the description given above.

Value

BayesSurv_HReg returns an object of class Bayes_HReg.

Note

The posterior samples of VgV_g are saved separately in working directory/path.

Author(s)

Kyu Ha Lee and Sebastien Haneuse
Maintainer: Kyu Ha Lee <klee15239@gmail.com>

References

Lee, K. H., Haneuse, S., Schrag, D., and Dominici, F. (2015), Bayesian semiparametric analysis of semicompeting risks data: investigating hospital readmission after a pancreatic cancer diagnosis, Journal of the Royal Statistical Society: Series C, 64, 2, 253-273.

Lee, K. H., Dominici, F., Schrag, D., and Haneuse, S. (2016), Hierarchical models for semicompeting risks data with application to quality of end-of-life care for pancreatic cancer, Journal of the American Statistical Association, 111, 515, 1075-1095.

Alvares, D., Haneuse, S., Lee, C., Lee, K. H. (2019), SemiCompRisks: An R package for the analysis of independent and cluster-correlated semi-competing risks data, The R Journal, 11, 1, 376-400.

See Also

initiate.startValues_HReg, print.Bayes_HReg, summary.Bayes_HReg, predict.Bayes_HReg

Examples

	
## Not run: 		
# loading a data set	
data(survData)
id=survData$cluster

form <- Formula(time + event ~ cov1 + cov2)

#####################
## Hyperparameters ##
#####################

## Weibull baseline hazard function: alpha1, kappa1
##
WB.ab <- c(0.5, 0.01) # prior parameters for alpha
##
WB.cd <- c(0.5, 0.05) # prior parameters for kappa

## PEM baseline hazard function: 
##
PEM.ab <- c(0.7, 0.7) # prior parameters for 1/sigma^2
##
PEM.alpha <- 10 # prior parameters for K

## Normal cluster-specific random effects
##
Normal.ab 	<- c(0.5, 0.01) 		# prior for zeta

## DPM cluster-specific random effects
##
DPM.ab <- c(0.5, 0.01)
aTau  <- 1.5
bTau  <- 0.0125

##
hyperParams <- list(WB=list(WB.ab=WB.ab, WB.cd=WB.cd),
                    PEM=list(PEM.ab=PEM.ab, PEM.alpha=PEM.alpha),
                    Normal=list(Normal.ab=Normal.ab),
                    DPM=list(DPM.ab=DPM.ab, aTau=aTau, bTau=bTau))
                    
###################
## MCMC SETTINGS ##
###################

## Setting for the overall run
##
numReps    <- 2000
thin       <- 10
burninPerc <- 0.5

## Settings for storage
##
storeV    <- TRUE

## Tuning parameters for specific updates
##
##  - those common to all models
mhProp_V_var     <- 0.05
##
## - those specific to the Weibull specification of the baseline hazard functions
mhProp_alpha_var <- 0.01
##
## - those specific to the PEM specification of the baseline hazard functions
C        <- 0.2
delPert  <- 0.5
rj.scheme <- 1
K_max    <- 50
s_max    <- max(survData$time[survData$event == 1])
time_lambda <- seq(1, s_max, 1)

##
mcmc.WB  <- list(run=list(numReps=numReps, thin=thin, burninPerc=burninPerc),
                    storage=list(storeV=storeV),
                    tuning=list(mhProp_alpha_var=mhProp_alpha_var, mhProp_V_var=mhProp_V_var))
##
mcmc.PEM <- list(run=list(numReps=numReps, thin=thin, burninPerc=burninPerc),
                    storage=list(storeV=storeV),
                    tuning=list(mhProp_V_var=mhProp_V_var, C=C, delPert=delPert,
                    rj.scheme=rj.scheme, K_max=K_max, time_lambda=time_lambda))

################################################################
## Analysis of Independent Univariate Survival Data ############
################################################################

#############
## WEIBULL ##
#############

##
myModel <- "Weibull"
myPath  <- "Output/01-Results-WB/"

startValues      <- initiate.startValues_HReg(form, survData, model=myModel, nChain=2)

##
fit_WB <- BayesSurv_HReg(form, survData, id=NULL, model=myModel, 
                  hyperParams, startValues, mcmc.WB, path=myPath)
                  
fit_WB
summ.fit_WB <- summary(fit_WB); names(summ.fit_WB)
summ.fit_WB
pred_WB <- predict(fit_WB, tseq=seq(from=0, to=30, by=5))
plot(pred_WB, plot.est="Haz")
plot(pred_WB, plot.est="Surv")

#########
## PEM ##
#########
                
##
myModel <- "PEM"
myPath  <- "Output/02-Results-PEM/"

startValues      <- initiate.startValues_HReg(form, survData, model=myModel, nChain=2)

##
fit_PEM <- BayesSurv_HReg(form, survData, id=NULL, model=myModel,
                   hyperParams, startValues, mcmc.PEM, path=myPath)
                   
fit_PEM
summ.fit_PEM <- summary(fit_PEM); names(summ.fit_PEM)
summ.fit_PEM
pred_PEM <- predict(fit_PEM)
plot(pred_PEM, plot.est="Haz")
plot(pred_PEM, plot.est="Surv")

###############################################################
## Analysis of Correlated Univariate Survival Data ############
###############################################################

####################
## WEIBULL-NORMAL ##
####################

##
myModel <- c("Weibull", "Normal")
myPath  <- "Output/03-Results-WB_Normal/"

startValues      <- initiate.startValues_HReg(form, survData, model=myModel, id, nChain=2)

##
fit_WB_N <- BayesSurv_HReg(form, survData, id, model=myModel,
                        hyperParams, startValues, mcmc.WB, path=myPath)
                        
fit_WB_N
summ.fit_WB_N <- summary(fit_WB_N); names(summ.fit_WB_N)
summ.fit_WB_N
pred_WB_N <- predict(fit_WB_N, tseq=seq(from=0, to=30, by=5))
plot(pred_WB_N, plot.est="Haz")
plot(pred_WB_N, plot.est="Surv")

#################
## WEIBULL-DPM ##
#################

##
myModel <- c("Weibull", "DPM")
myPath  <- "Output/04-Results-WB_DPM/"

startValues      <- initiate.startValues_HReg(form, survData, model=myModel, id, nChain=2)

##
fit_WB_DPM <- BayesSurv_HReg(form, survData, id, model=myModel,
                        hyperParams, startValues, mcmc.WB, path=myPath)

fit_WB_DPM
summ.fit_WB_DPM <- summary(fit_WB_DPM); names(summ.fit_WB_DPM)
summ.fit_WB_DPM
pred_WB_DPM <- predict(fit_WB_DPM, tseq=seq(from=0, to=30, by=5))
plot(pred_WB_DPM, plot.est="Haz")
plot(pred_WB_DPM, plot.est="Surv")

################
## PEM-NORMAL ##
################

##
myModel <- c("PEM", "Normal")
myPath  <- "Output/05-Results-PEM_Normal/"

startValues      <- initiate.startValues_HReg(form, survData, model=myModel, id, nChain=2)

##
fit_PEM_N <- BayesSurv_HReg(form, survData, id, model=myModel,
                            hyperParams, startValues, mcmc.PEM, path=myPath)

fit_PEM_N
summ.fit_PEM_N <- summary(fit_PEM_N); names(summ.fit_PEM_N)
summ.fit_PEM_N
pred_PEM_N <- predict(fit_PEM_N)
plot(pred_PEM_N, plot.est="Haz")
plot(pred_PEM_N, plot.est="Surv")

#############
## PEM-DPM ##
#############

##
myModel <- c("PEM", "DPM")
myPath  <- "Output/06-Results-PEM_DPM/"

startValues      <- initiate.startValues_HReg(form, survData, model=myModel, id, nChain=2)

##
fit_PEM_DPM <- BayesSurv_HReg(form, survData, id, model=myModel,
                        hyperParams, startValues, mcmc.PEM, path=myPath)
                        
fit_PEM_DPM
summ.fit_PEM_DPM <- summary(fit_PEM_DPM); names(summ.fit_PEM_DPM)
summ.fit_PEM_DPM
pred_PEM_DPM <- predict(fit_PEM_DPM)
plot(pred_PEM_DPM, plot.est="Haz")
plot(pred_PEM_DPM, plot.est="Surv")

## End(Not run)

[Package SemiCompRisks version 3.4 Index]