survregbayes {spBayesSurv} | R Documentation |
Bayesian Semiparametric Survival Models
Description
This function fits semiparametric proportional hazards (PH), proportional odds (PO), accelerated failture time (AFT) and accelerated hazards (AH) models. Both georeferenced (location observed exactly) and areally observed (location known up to a geographic unit such as a county) spatial locations can be handled. Georeferenced data are modeled with Gaussian random field (GRF) frailties whereas areal data are modeled with a conditional autoregressive (CAR) prior on frailties. For non-spatial clustered data, an IID Gaussian frailties are assumed. Variable selection is also incorporated. The function can fit both Case I and Case II interval censored data, as well as standard right-censored, uncensored, and mixtures of these. The transformed Bernstein Polynomial (TBP) prior is used for fitting the baseline survival function. The full scale approximation (FSA) (Sang and Huang, 2012) could be used to inverse the spatial correlation matrix for georeferenced data. The function also fits all these models without frailties. The logarithm of the pseudo marginal likelihood (LPML), the deviance information criterion (DIC), and the Watanabe-Akaike information criterion (WAIC) are provided for model comparison.
Usage
survregbayes(formula, data, na.action, survmodel="PH", dist="loglogistic",
mcmc=list(nburn=3000, nsave=2000, nskip=0, ndisplay=500),
prior=NULL, state=NULL, selection=FALSE, Proximity=NULL,
truncation_time=NULL, subject.num=NULL, Knots=NULL,
Coordinates=NULL, DIST=NULL, InitParamMCMC=TRUE,
scale.designX=TRUE)
Arguments
formula |
a formula expression with the response returned by the |
data |
a data frame in which to interpret the variables named in the |
na.action |
a missing-data filter function, applied to the |
survmodel |
a character string for the assumed survival model. The options include |
dist |
centering distribution for TBP. Choices include |
mcmc |
a list giving the MCMC parameters. The list must include the following elements: |
prior |
a list giving the prior information. The function itself provides all default priors. Note if FSA is used, the number of knots |
state |
a list giving the current value of the parameters. This list is used if the current analysis is the continuation of a previous analysis. |
selection |
flag to indicate whether variable selection is performed, where |
Proximity |
an m by m symetric adjacency matrix, where m is the number of clusters/regions. If CAR frailty model is specified in the formula, |
truncation_time |
a vector of left-trucation times with length n. |
subject.num |
a vector of subject id numbers when time dependent covariates are considered. For example, for subject 1 time dependent covariates are recorded over [0,1), [1,3), and for subject 2 covariates are recorded over [0,2), [2,3), [3,4). Suppose we only have two subjects, i.e. n=2. In this case, we save the data in the long format, set truncation_time=c(0,1,0,2,3) and subject.num=c(1,1,2,2,2). |
Knots |
an nknots by d matrix, where nknots is the number of selected knots for FSA, and d is the dimension of each location. If |
Coordinates |
an m by d coordinates matrix, where m is the number of clusters/regions, d is the dimension of coordiantes. If GRF frailty model is specified in the formula, |
DIST |
This is a function argument, used to calculate the distance. The default is Euclidean distance ( |
InitParamMCMC |
flag to indicate whether an initial MCMC will be run based on the centering parametric model, where |
scale.designX |
flag to indicate whether the design matrix X will be centered by column means and scaled by column standard deviations, where |
Value
This class of objects is returned by the survregbayes
function to represent a fitted Bayesian semiparametric survival model. Objects of this class have methods for the functions print
, summary
and plot
.
The survregbayes
object is a list containing at least the following components:
modelname |
the name of the fitted model |
terms |
the |
dist |
the centering distribution used in the TBP prior on baseline survival function |
survmodel |
the model fitted |
coefficients |
a named vector of coefficients. The last two elements are the estimates of theta1 and theta2 involved in the centering baseline survival function. |
call |
the matched call |
prior |
the list of hyperparameters used in all priors. |
mcmc |
the list of MCMC parameters used |
n |
the number of row observations used in fitting the model |
p |
the number of columns in the model matrix |
nsubject |
the number of subjects/individuals, which is equal to n in the absence of time-dependent covariates |
subject.num |
the vector of subject id numbers when time dependent covariates are considered |
truncation_time |
the vector of left-trucation times |
Surv |
the |
X.scaled |
the n by p scaled design matrix |
X |
the n by p orginal design matrix |
beta |
the p by nsave matrix of posterior samples for the coefficients in the |
theta.scaled |
the 2 by nsave matrix of posterior samples for theta1 and theta2 involved in the centering baseline survival function. Note that these posterior samples are based scaled design matrix. |
beta.scaled |
the p by nsave matrix of posterior samples for the coefficients in the |
alpha |
the vector of posterior samples for the precision parameter alpha in the TBP prior. |
maxL |
the truncation level used in the TBP prior. |
weight |
the maxL by nsave matrix of posterior samples for the weights in the TBP prior. |
cpo |
the length n vector of the stabilized estiamte of CPO; used for calculating LPML |
pD |
the effective number of parameters involved in DIC |
DIC |
the deviance information criterion (DIC) |
pW |
the effective number of parameters involved in WAIC |
WAIC |
the Watanabe-Akaike information criterion (WAIC) |
Surv.cox.snell |
the |
ratetheta |
the acceptance rate in the posterior sampling of theta vector involved in the centering baseline survival function |
ratebeta |
the acceptance rate in the posterior sampling of beta coefficient vector |
rateYs |
the acceptance rate in the posterior sampling of weight vector involved in the TBP prior |
ratec |
the acceptance rate in the posterior sampling of precision parameter alpha involved in the TBP prior |
frail.prior |
the frailty prior used in |
selection |
whether the variable selection was performed |
initial.values |
the list of initial values used for the parameters |
BF.baseline |
the Bayes factor for comparing the parametric baseline vs. the TBP baseline |
BF.bs |
Bayes factors for testing linearity when |
The object will also have the following components when frailty models are fit:
v |
the nID by nsave matrix of posterior samples for frailties, where nID is the number of clusters considered. |
ratev |
the vector of acceptance rates in the posterior sampling of frailties |
tau2 |
the vector of posterior samples for tau2 involved in the IID, GRF or CAR frailty prior. |
ID |
the cluster ID used in |
If GRF frailties are used, the object will also have:
Coordinates |
the |
ratephi |
the acceptance rates in the posterior sampling of phi involved in the GRF prior |
phi |
the vector of posterior samples for phi involved in the GRF prior |
Knots |
the |
If the variable selection is performed, the object will also include:
gamma |
the p by nsave matrix of posterior samples for gamma involved in the variable selection |
Author(s)
Haiming Zhou and Timothy Hanson
References
Zhou, H., Hanson, T., and Zhang, J. (2020). spBayesSurv: Fitting Bayesian Spatial Survival Models Using R. Journal of Statistical Software, 92(9): 1-33.
Zhou, H. and Hanson, T. (2018). A unified framework for fitting Bayesian semiparametric models to arbitrarily censored survival data, including spatially-referenced data. Journal of the American Statistical Association, 113(522): 571-581.
Sang, H. and Huang, J. Z. (2012). A full scale approximation of covariance functions for large spatial data sets. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74(1), 111-132.
See Also
frailtyprior
, cox.snell.survregbayes
, rdist
, rdist.earth
Examples
rm(list=ls())
library(survival)
library(spBayesSurv)
library(coda)
library(MASS)
library(fields)
## True coeffs
betaT = c(1,1);
## Baseline Survival
f0oft = function(t) 0.5*dlnorm(t, -1, 0.5)+0.5*dlnorm(t,1,0.5);
S0oft = function(t) (0.5*plnorm(t, -1, 0.5, lower.tail=FALSE)+
0.5*plnorm(t, 1, 0.5, lower.tail=FALSE))
## The Survival function:
Sioft = function(t,x,v=0) exp( log(S0oft(t))*exp(sum(x*betaT)+v) ) ;
Fioft = function(t,x,v=0) 1-Sioft(t,x,v);
## The inverse for Fioft
Finv = function(u, x,v=0) uniroot(function (t) Fioft(t,x,v)-u, lower=1e-100,
upper=1e100, extendInt ="yes", tol=1e-6)$root
## correlation function
rho_Exp = function(dis, phi) exp(-(dis*phi));
###############################################################################
########################### Start to simulation ###############################
###############################################################################
phiT=1; sill=0.9999; ## phiT is the range parameter phi.
tau2T = 1; ## true frailty variance;
m = 50; mi=2
id=rep(1:m, each=mi)
mseq = rep(mi, m);
n = sum(mseq);
s1 = runif(m, 0, 10); s2 = runif(m, 0, 10);
locs = cbind(s1, s2);
ss = cbind(rep(locs[,1],each=mi), rep(locs[,2],each=mi)); ### the locations.
Dmm = .Call("DistMat", t(locs), t(locs), PACKAGE = "spBayesSurv");
Rmm = sill*rho_Exp(Dmm, phiT)+diag(1-sill, m, m);
v = mvrnorm(1, mu=rep(0,m), Sigma=tau2T*Rmm);
vn = rep(v, each=mi)
## generate x
x1 = rbinom(n, 1, 0.5); x2 = rnorm(n, 0, 1); X = cbind(x1, x2);
## generate survival times
u = runif(n);
tT = rep(0, n);
for (i in 1:n){
tT[i] = Finv(u[i], X[i,], vn[i]);
}
### ----------- right censored -------------###
t1=tT;t2=tT;
## right censored
Centime = runif(n, 2,6);
delta = (tT<=Centime) +0 ; length(which(delta==0))/n;
rcen = which(delta==0);
t1[rcen] = Centime[rcen];
t2[rcen] = NA;
## make a data frame
## Method 1: in the interval-censoring notation:
## t1 is the left endpoint and t2 is the right endpoint.
## This way we could use Surv(t1, t2, type="interval2")
## Method 2: Because we have right-censored data,
## we could use t1 as the observed survival times and delta as the indicator.
## This way we could use Surv(t1, delta). This is the same as above.
## (s1, s2) are the locations.
d = data.frame(t1=t1, t2=t2, x1=x1, x2=x2, delta=delta,
s1=ss[,1], s2=ss[,2], id=id);
table(d$delta)/n;
##-------------spBayesSurv-------------------##
# MCMC parameters
nburn=200; nsave=200; nskip=0;
# Note larger nburn, nsave and nskip should be used in practice.
mcmc=list(nburn=nburn, nsave=nsave, nskip=nskip, ndisplay=1000);
prior = list(maxL=15, a0=1, b0=1, nknots=m, nblock=m, nu=1);
# here if nknots<m, FSA will be used with nblock=m.
cor.dist = function(x1, x2) rdist(x1,x2)
ptm<-proc.time()
res1 = survregbayes(formula = Surv(t1, delta)~x1+x2+
frailtyprior("grf", id), data=d, InitParamMCMC=FALSE,
survmodel="PH", prior=prior, mcmc=mcmc, DIST=cor.dist,
dist="loglogistic", Coordinates = locs);
## Or equivalently formula = Surv(t1, t2, type="interval2") can also be used.
## Note InitParamMCMC=FALSE is used for speeding,
## InitParamMCMC=TRUE is recommended in general.
sfit=summary(res1); sfit
systime1=proc.time()-ptm; systime1;
############################################
## Results
############################################
## acceptance rate of frailties
res1$ratev[1]
## traceplots;
par(mfrow=c(2,3));
traceplot(mcmc(res1$beta[1,]), main="beta1");
traceplot(mcmc(res1$beta[2,]), main="beta2");
traceplot(mcmc(res1$v[1,]), main="frailty");
traceplot(mcmc(res1$v[2,]), main="frailty");
traceplot(mcmc(res1$v[3,]), main="frailty");
#traceplot(mcmc(res1$v[4,]), main="frailty");
traceplot(mcmc(res1$phi), main="phi");
############################################
## Curves
############################################
par(mfrow=c(1,1));
wide=0.2;
tgrid = seq(1e-10,4,wide);
ngrid = length(tgrid);
p = length(betaT); # number of covariates
newdata = data.frame(x1=c(0,0), x2=c(0,1))
plot(res1, xnewdata=newdata, tgrid=tgrid, PLOT=TRUE);
## Cox-Snell plot
set.seed(1)
cox.snell.survregbayes(res1, ncurves=2, PLOT=TRUE);