R: Bayesian Semiparametric Survival Models

survregbayes {spBayesSurv}

R Documentation

Bayesian Semiparametric Survival Models

Description

This function fits semiparametric proportional hazards (PH), proportional odds (PO), accelerated failture time (AFT) and accelerated hazards (AH) models. Both georeferenced (location observed exactly) and areally observed (location known up to a geographic unit such as a county) spatial locations can be handled. Georeferenced data are modeled with Gaussian random field (GRF) frailties whereas areal data are modeled with a conditional autoregressive (CAR) prior on frailties. For non-spatial clustered data, an IID Gaussian frailties are assumed. Variable selection is also incorporated. The function can fit both Case I and Case II interval censored data, as well as standard right-censored, uncensored, and mixtures of these. The transformed Bernstein Polynomial (TBP) prior is used for fitting the baseline survival function. The full scale approximation (FSA) (Sang and Huang, 2012) could be used to inverse the spatial correlation matrix for georeferenced data. The function also fits all these models without frailties. The logarithm of the pseudo marginal likelihood (LPML), the deviance information criterion (DIC), and the Watanabe-Akaike information criterion (WAIC) are provided for model comparison.

Usage

survregbayes(formula, data, na.action, survmodel="PH", dist="loglogistic", 
             mcmc=list(nburn=3000, nsave=2000, nskip=0, ndisplay=500), 
             prior=NULL, state=NULL, selection=FALSE, Proximity=NULL, 
             truncation_time=NULL, subject.num=NULL, Knots=NULL,
             Coordinates=NULL, DIST=NULL, InitParamMCMC=TRUE, 
             scale.designX=TRUE)

Arguments

`formula`	a formula expression with the response returned by the `Surv` function in the `survival` package. It supports right-censoring, left-censoring, interval-censoring, and mixtures of them. To include CAR frailties, add `frailtyprior("car",ID)` to the formula, where `ID` is an n dimensional vector of cluster ID numbers. Furthermore, use `frailtyprior("iid",ID)` for Gaussian exchangeable frailties, use `frailtyprior("grf",ID)` for Gaussian random fields (GRF) frailties, and exclude the term `frailtyprior()` for non-frailty models. Note: the data need to be sorted by ID.
`data`	a data frame in which to interpret the variables named in the `formula` argument.
`na.action`	a missing-data filter function, applied to the `model.frame`.
`survmodel`	a character string for the assumed survival model. The options include `"PH"` for proportional hazards, `"PO"` for proportional odds, and `"AFT"` for accelerated failture time.
`dist`	centering distribution for TBP. Choices include `"loglogistic"`, `"lognormal"`, and `"weibull"`.
`mcmc`	a list giving the MCMC parameters. The list must include the following elements: `nburn` an integer giving the number of burn-in scans, `nskip` an integer giving the thinning interval, `nsave` an integer giving the total number of scans to be saved, `ndisplay` an integer giving the number of saved scans to be displayed on screen (the function reports on the screen when every `ndisplay` iterations have been carried out).
`prior`	a list giving the prior information. The function itself provides all default priors. Note if FSA is used, the number of knots `knots` and the number of blocks `nblock` are specified here; see examples below. See Zhou, Hanson and Zhang (2018) for more detailed hyperprior specifications.
`state`	a list giving the current value of the parameters. This list is used if the current analysis is the continuation of a previous analysis.
`selection`	flag to indicate whether variable selection is performed, where `FALSE` indicates that no variable selection will be performed.
`Proximity`	an m by m symetric adjacency matrix, where m is the number of clusters/regions. If CAR frailty model is specified in the formula, `Proximity` is required; otherwise it is ignored. Note: this matrix should be specified according to the data that have been sorted by ID.
`truncation_time`	a vector of left-trucation times with length n.
`subject.num`	a vector of subject id numbers when time dependent covariates are considered. For example, for subject 1 time dependent covariates are recorded over [0,1), [1,3), and for subject 2 covariates are recorded over [0,2), [2,3), [3,4). Suppose we only have two subjects, i.e. n=2. In this case, we save the data in the long format, set truncation_time=c(0,1,0,2,3) and subject.num=c(1,1,2,2,2).
`Knots`	an nknots by d matrix, where nknots is the number of selected knots for FSA, and d is the dimension of each location. If `Knots` is not specified, the space-filling algorithm will be used to find the knots.
`Coordinates`	an m by d coordinates matrix, where m is the number of clusters/regions, d is the dimension of coordiantes. If GRF frailty model is specified in the formula, `Coordinates` is required; otherwise it is ignored. Note: this matrix should be specified according to the data that have been sorted by ID.
`DIST`	This is a function argument, used to calculate the distance. The default is Euclidean distance (`fields::rdist`). This function should have two arguments (X1,X2), where X1 and X2 are matrices with coordinates as the rows. The returned value of this function should be the pairwise distance matrix. If nrow(X1)=m and nrow(X2)=n then the function should return an m by n matrix of all distances between these two sets of points.
`InitParamMCMC`	flag to indicate whether an initial MCMC will be run based on the centering parametric model, where `TRUE` indicates yes.
`scale.designX`	flag to indicate whether the design matrix X will be centered by column means and scaled by column standard deviations, where `TRUE` indicates yes. The default is `TRUE` for improving numerical stability. Even when it is scaled, the reported regression coefficients are in original scales. Note if we want to specify informative priors for regression coefficients, these priors should correspond to scaled predictors when `scale.designX=TRUE`.

Value

This class of objects is returned by the survregbayes function to represent a fitted Bayesian semiparametric survival model. Objects of this class have methods for the functions print, summary and plot.

The survregbayes object is a list containing at least the following components:

`modelname`	the name of the fitted model
`terms`	the `terms` object used
`dist`	the centering distribution used in the TBP prior on baseline survival function
`survmodel`	the model fitted
`coefficients`	a named vector of coefficients. The last two elements are the estimates of theta1 and theta2 involved in the centering baseline survival function.
`call`	the matched call
`prior`	the list of hyperparameters used in all priors.
`mcmc`	the list of MCMC parameters used
`n`	the number of row observations used in fitting the model
`p`	the number of columns in the model matrix
`nsubject`	the number of subjects/individuals, which is equal to n in the absence of time-dependent covariates
`subject.num`	the vector of subject id numbers when time dependent covariates are considered
`truncation_time`	the vector of left-trucation times
`Surv`	the `Surv` object used
`X.scaled`	the n by p scaled design matrix
`X`	the n by p orginal design matrix
`beta`	the p by nsave matrix of posterior samples for the coefficients in the `linear.predictors`
`theta.scaled`	the 2 by nsave matrix of posterior samples for theta1 and theta2 involved in the centering baseline survival function. Note that these posterior samples are based scaled design matrix.
`beta.scaled`	the p by nsave matrix of posterior samples for the coefficients in the `linear.predictors`. Note that these posterior samples are based scaled design matrix.
`alpha`	the vector of posterior samples for the precision parameter alpha in the TBP prior.
`maxL`	the truncation level used in the TBP prior.
`weight`	the maxL by nsave matrix of posterior samples for the weights in the TBP prior.
`cpo`	the length n vector of the stabilized estiamte of CPO; used for calculating LPML
`pD`	the effective number of parameters involved in DIC
`DIC`	the deviance information criterion (DIC)
`pW`	the effective number of parameters involved in WAIC
`WAIC`	the Watanabe-Akaike information criterion (WAIC)
`Surv.cox.snell`	the `Surv` object used for Cox-Snell residual plot. This is not recommended for frailty models, for which please use the function `cox.snell.survregbayes`.
`ratetheta`	the acceptance rate in the posterior sampling of theta vector involved in the centering baseline survival function
`ratebeta`	the acceptance rate in the posterior sampling of beta coefficient vector
`rateYs`	the acceptance rate in the posterior sampling of weight vector involved in the TBP prior
`ratec`	the acceptance rate in the posterior sampling of precision parameter alpha involved in the TBP prior
`frail.prior`	the frailty prior used in `frailtyprior`
`selection`	whether the variable selection was performed
`initial.values`	the list of initial values used for the parameters
`BF.baseline`	the Bayes factor for comparing the parametric baseline vs. the TBP baseline
`BF.bs`	Bayes factors for testing linearity when `bspline` is added to the `linear.predictors`

The object will also have the following components when frailty models are fit:

`v`	the nID by nsave matrix of posterior samples for frailties, where nID is the number of clusters considered.
`ratev`	the vector of acceptance rates in the posterior sampling of frailties
`tau2`	the vector of posterior samples for tau2 involved in the IID, GRF or CAR frailty prior.
`ID`	the cluster ID used in `frailtyprior`

If GRF frailties are used, the object will also have:

`Coordinates`	the `Coordinates` matrix used in `survregbayes`
`ratephi`	the acceptance rates in the posterior sampling of phi involved in the GRF prior
`phi`	the vector of posterior samples for phi involved in the GRF prior
`Knots`	the `Knots` matrix used in `survregbayes`

If the variable selection is performed, the object will also include:

gamma

the p by nsave matrix of posterior samples for gamma involved in the variable selection

Author(s)

Haiming Zhou and Timothy Hanson

References

Zhou, H., Hanson, T., and Zhang, J. (2020). spBayesSurv: Fitting Bayesian Spatial Survival Models Using R. Journal of Statistical Software, 92(9): 1-33.

Zhou, H. and Hanson, T. (2018). A unified framework for fitting Bayesian semiparametric models to arbitrarily censored survival data, including spatially-referenced data. Journal of the American Statistical Association, 113(522): 571-581.

Sang, H. and Huang, J. Z. (2012). A full scale approximation of covariance functions for large spatial data sets. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74(1), 111-132.

Examples

rm(list=ls())
library(survival)
library(spBayesSurv)
library(coda)
library(MASS)
library(fields)

## True coeffs
betaT = c(1,1); 
## Baseline Survival
f0oft = function(t) 0.5*dlnorm(t, -1, 0.5)+0.5*dlnorm(t,1,0.5);
S0oft = function(t) (0.5*plnorm(t, -1, 0.5, lower.tail=FALSE)+
                       0.5*plnorm(t, 1, 0.5, lower.tail=FALSE))
## The Survival function:
Sioft = function(t,x,v=0)  exp( log(S0oft(t))*exp(sum(x*betaT)+v) ) ;
Fioft = function(t,x,v=0) 1-Sioft(t,x,v);
## The inverse for Fioft
Finv = function(u, x,v=0) uniroot(function (t) Fioft(t,x,v)-u, lower=1e-100, 
                                  upper=1e100, extendInt ="yes", tol=1e-6)$root
## correlation function
rho_Exp = function(dis, phi) exp(-(dis*phi));

###############################################################################
########################### Start to simulation ###############################
###############################################################################
phiT=1; sill=0.9999; ## phiT is the range parameter phi. 
tau2T = 1; ## true frailty variance; 
m = 50; mi=2
id=rep(1:m, each=mi)
mseq = rep(mi, m);
n = sum(mseq);
s1 = runif(m, 0, 10); s2 = runif(m, 0, 10);
locs = cbind(s1, s2); 
ss = cbind(rep(locs[,1],each=mi), rep(locs[,2],each=mi)); ### the locations. 
Dmm = .Call("DistMat", t(locs), t(locs), PACKAGE = "spBayesSurv");
Rmm = sill*rho_Exp(Dmm, phiT)+diag(1-sill, m, m);
v = mvrnorm(1, mu=rep(0,m), Sigma=tau2T*Rmm); 
vn = rep(v, each=mi)
## generate x 
x1 = rbinom(n, 1, 0.5); x2 = rnorm(n, 0, 1); X = cbind(x1, x2);
## generate survival times
u = runif(n);
tT = rep(0, n);
for (i in 1:n){
  tT[i] = Finv(u[i], X[i,], vn[i]);
}

### ----------- right censored -------------###
t1=tT;t2=tT;
## right censored
Centime = runif(n, 2,6);
delta = (tT<=Centime) +0 ; length(which(delta==0))/n;
rcen = which(delta==0);
t1[rcen] = Centime[rcen];
t2[rcen] = NA;
## make a data frame
## Method 1: in the interval-censoring notation: 
## t1 is the left endpoint and t2 is the right endpoint.
## This way we could use Surv(t1, t2, type="interval2")
## Method 2: Because we have right-censored data, 
## we could use t1 as the observed survival times and delta as the indicator. 
## This way we could use Surv(t1, delta). This is the same as above. 
## (s1, s2) are the locations. 
d = data.frame(t1=t1, t2=t2, x1=x1, x2=x2, delta=delta, 
               s1=ss[,1], s2=ss[,2], id=id); 
table(d$delta)/n;

##-------------spBayesSurv-------------------##
# MCMC parameters
nburn=200; nsave=200; nskip=0;
# Note larger nburn, nsave and nskip should be used in practice.
mcmc=list(nburn=nburn, nsave=nsave, nskip=nskip, ndisplay=1000);
prior = list(maxL=15, a0=1, b0=1, nknots=m, nblock=m, nu=1);
# here if nknots<m, FSA will be used with nblock=m.
cor.dist = function(x1, x2) rdist(x1,x2)
ptm<-proc.time()
res1 = survregbayes(formula = Surv(t1, delta)~x1+x2+
                       frailtyprior("grf", id), data=d, InitParamMCMC=FALSE,
                     survmodel="PH", prior=prior, mcmc=mcmc, DIST=cor.dist,
                     dist="loglogistic", Coordinates = locs);
## Or equivalently formula = Surv(t1, t2, type="interval2") can also be used.
## Note InitParamMCMC=FALSE is used for speeding,
## InitParamMCMC=TRUE is recommended in general. 
sfit=summary(res1); sfit
systime1=proc.time()-ptm; systime1;

############################################
## Results
############################################
## acceptance rate of frailties
res1$ratev[1]
## traceplots;
par(mfrow=c(2,3));
traceplot(mcmc(res1$beta[1,]), main="beta1");
traceplot(mcmc(res1$beta[2,]), main="beta2");
traceplot(mcmc(res1$v[1,]), main="frailty");
traceplot(mcmc(res1$v[2,]), main="frailty");
traceplot(mcmc(res1$v[3,]), main="frailty");
#traceplot(mcmc(res1$v[4,]), main="frailty");
traceplot(mcmc(res1$phi), main="phi");

############################################
## Curves
############################################
par(mfrow=c(1,1));
wide=0.2;
tgrid = seq(1e-10,4,wide);
ngrid = length(tgrid);
p = length(betaT); # number of covariates
newdata = data.frame(x1=c(0,0), x2=c(0,1))
plot(res1, xnewdata=newdata, tgrid=tgrid, PLOT=TRUE);

## Cox-Snell plot
set.seed(1)
cox.snell.survregbayes(res1, ncurves=2, PLOT=TRUE);

[Package spBayesSurv version 1.1.8 Index]