bayessurvreg3 {bayesSurv}  R Documentation 
Clusterspecific accelerated failure time model for multivariate, possibly doublyintervalcensored data with flexibly specified random effects and/or error distribution.
Description
A function to estimate a regression model with possibly clustered (possibly right, left, interval or doublyinterval censored) data. In the case of doublyinterval censoring, different regression models can be specified for the onset and event times.
A univariate random effect (random intercept) with the distribution expressed as a penalized normal mixture can be included in the model to adjust for clusters.
The error density of the regression model is specified as a mixture of Bayesian Gsplines (normal densities with equidistant means and constant variances). This function performs an MCMC sampling from the posterior distribution of unknown quantities.
For details, see Komárek (2006) and Komárek and Lesaffre (2008).
SUPPLEMENTED IN 06/2013: Intervalcensored times might be subject to misclassification. In case of doublyintervalcensored data, the event time might be subject to misclassification. For details, see GarcíaZattera, Jara and Komárek (2016).
We explain first in more detail a model without doubly censoring.
Let T_{i,l},\; i=1,\dots, N,\; l=1,\dots, n_i
be event times for i
th cluster and the units within that cluster
The following regression model is assumed:
\log(T_{i,l}) = \beta'x_{i,l} + b_i + \varepsilon_{i,l},\quad i=1,\dots, N,\;l=1,\dots, n_i
where \beta
is unknown regression parameter vector,
x_{i,l}
is a vector of covariates.
b_i
is a clusterspecific random effect (random intercept).
The random effects b_i,\;i=1,\dots, N
are assumed to be i.i.d. with a univariate density g_{b}(b)
.
The error terms
\varepsilon_{i,l},\;i=1,\dots, N, l=1,\dots, n_i
are assumed to be i.i.d. with a univariate density
g_{\varepsilon}(e)
.
Densities g_{b}
and g_{\varepsilon}
are
both expressed as
a mixture of Bayesian Gsplines (normal densities with equidistant
means and constant variances). We distinguish two,
theoretically equivalent, specifications.
In the following, the density for \varepsilon
is explicitely described. The density for b
is obtained in
an analogous manner.
 Specification 1

\varepsilon \sim \sum_{j=K}^{K} w_{j} N(\mu_{j},\,\sigma^2)
where
\sigma^2
is the unknown basis variance and\mu_{j},\;j=K,\dots, K
is an equidistant grid of knots symmetric around the unknown point\gamma
and related to the unknown basis variance through the relationship\mu_{j} = \gamma + j\delta\sigma,\quad j=K,\dots,K,
where
\delta
is fixed constants, e.g.\delta=2/3
(which has a justification of being close to cubic Bsplines).  Specification 2

\varepsilon \sim \alpha + \tau\,V
where
\alpha
is an unknown intercept term and\tau
is an unknown scale parameter.V
is then standardized error term which is distributed according to the univariate normal mixture, i.e.V\sim \sum_{j=K}^{K} w_{j} N(\mu_{j},\,\sigma^2)
where
\mu_{j},\;j=K,\dots, K
is an equidistant grid of fixed knots (means), usually symmetric about the fixed point\gamma=0
and\sigma^2
is fixed basis variance. Reasonable values for the numbers of grid pointsK
isK=15
with the distance between the two knots equal to\delta=0.3
and for the basis variance\sigma^2=0.2^2.
Personally, I found Specification 2 performing better. In the paper Komárek and Lesaffre (2008) only Specification 2 is described.
The mixture weights
w_{j},\;j=K,\dots, K
are
not estimated directly. To avoid the constraints
0 < w_{j} < 1
and
\sum_{j=K}^{K}\,w_j = 1
transformed weights a_{j},\;j=K,\dots, K
related to the original weights by the logistic transformation:
a_{j} = \frac{\exp(w_{j})}{\sum_{m}\exp(w_{m})}
are estimated instead.
A Bayesian model is set up for all unknown parameters. For more details I refer to Komárek and Lesaffre (2008).
If there are doublycensored data the model of the same type as above can be specified for both the onset time and the timetoevent.
In the case one wishes to link the random intercept of the onset model and the random intercept of the timetoevent model, there are the following possibilities.
Bivariate normal distribution
It is assumed that the pair of random intercepts from the onset and
timetoevent part of the model are normally distributed with zero
mean and an unknown covariance matrix D
.
A priori, the inverse covariance matrix D^{1}
is
addumed to follow a Wishart distribution.
Unknown correlation between the basis Gsplines
Each pair of basis Gsplines describing the distribution of the random
intercept in the onset part and the timetoevent part of the model is
assumed to be correlated with an unknown correlation coefficient
\varrho
. Note that this is just an experiment and is no
more further supported.
Prior distribution on \varrho
is assumed to be
uniform. In the MCMC, the Fisher Z transform of the \varrho
given by
Z =
\frac{1}{2}\log\Bigl(\frac{1\varrho}{1+\varrho}\Bigr)=\mbox{atanh}(\varrho)
is sampled. Its prior is derived from the uniform prior
\mbox{Unif}(1,\;1)
put on \varrho.
The Fisher Z transform is updated using the MetropolisHastings alhorithm. The proposal distribution is given either by a normal approximation obtained using the Taylor expansion of the full conditional distribution or by a Langevin proposal (see Robert and Casella, 2004, p. 318).
Usage
bayessurvreg3(formula, random, formula2, random2,
data = parent.frame(),
classification,
classParam = list(Model = c("Examiner", "Factor:Examiner"),
a.sens = 1, b.sens = 1, a.spec = 1, b.spec = 1,
init.sens = NULL, init.spec = NULL),
na.action = na.fail, onlyX = FALSE,
nsimul = list(niter = 10, nthin = 1, nburn = 0, nwrite = 10),
prior, prior.beta, prior.b, init = list(iter = 0),
mcmc.par = list(type.update.a = "slice", k.overrelax.a = 1,
k.overrelax.sigma = 1, k.overrelax.scale = 1,
type.update.a.b = "slice", k.overrelax.a.b = 1,
k.overrelax.sigma.b = 1, k.overrelax.scale.b = 1),
prior2, prior.beta2, prior.b2, init2,
mcmc.par2 = list(type.update.a = "slice", k.overrelax.a = 1,
k.overrelax.sigma = 1, k.overrelax.scale = 1,
type.update.a.b = "slice", k.overrelax.a.b = 1,
k.overrelax.sigma.b = 1, k.overrelax.scale.b = 1),
priorinit.Nb,
rho = list(type.update = "fixed.zero", init=0, sigmaL=0.1),
store = list(a = FALSE, a2 = FALSE, y = FALSE, y2 = FALSE,
r = FALSE, r2 = FALSE, b = FALSE, b2 = FALSE,
a.b = FALSE, a.b2 = FALSE, r.b = FALSE, r.b2 = FALSE),
dir)
bayessurvreg3Para(formula, random, formula2, random2,
data = parent.frame(),
classification,
classParam = list(Model = c("Examiner", "Factor:Examiner"),
a.sens = 1, b.sens = 1, a.spec = 1, b.spec = 1,
init.sens = NULL, init.spec = NULL),
na.action = na.fail, onlyX = FALSE,
nsimul = list(niter = 10, nthin = 1, nburn = 0, nwrite = 10),
prior, prior.beta, prior.b, init = list(iter = 0),
mcmc.par = list(type.update.a = "slice", k.overrelax.a = 1,
k.overrelax.sigma = 1, k.overrelax.scale = 1,
type.update.a.b = "slice", k.overrelax.a.b = 1,
k.overrelax.sigma.b = 1, k.overrelax.scale.b = 1),
prior2, prior.beta2, prior.b2, init2,
mcmc.par2 = list(type.update.a = "slice", k.overrelax.a = 1,
k.overrelax.sigma = 1, k.overrelax.scale = 1,
type.update.a.b = "slice", k.overrelax.a.b = 1,
k.overrelax.sigma.b = 1, k.overrelax.scale.b = 1),
priorinit.Nb,
rho = list(type.update = "fixed.zero", init=0, sigmaL=0.1),
store = list(a = FALSE, a2 = FALSE, y = FALSE, y2 = FALSE,
r = FALSE, r2 = FALSE, b = FALSE, b2 = FALSE,
a.b = FALSE, a.b2 = FALSE, r.b = FALSE, r.b2 = FALSE),
dir)
Arguments
formula 
model formula for the regression. In the case of doublycensored data, this is the model formula for the onset time. The lefthand side of the Intercept is implicitely included in the model by the
estimation of the error distribution. As a consequence If
 
random 
formula for the ‘random’ part of the model.
In the case of doublycensored data, this is the
is allowed. If omitted, no random part is included in the model.  
formula2 
model formula for the regression of the timetoevent in
the case of doublycensored data. Ignored otherwise. The same structure as
for  
random2 
specification of the ‘random’ part of the model for
timetoevent in the case of doublycensored data. Ignored
otherwise. The same structure as for  
data 
optional data frame in which to interpret the variables
occuring in the  
classification 
Possible additional columns are ignored. If missing, no misclassification is considered.  
classParam 
a The following components of the
 
na.action 
the user is discouraged from changing the default
value  
onlyX 
if  
nsimul 
a list giving the number of iterations of the MCMC and other parameters of the simulation.
 
prior 
a list specifying the prior distribution of the Gspline
defining the distribution of the error term in the regression model
given by The item  
prior.b 
a list specifying the prior distribution of the
Gspline defining the distribution of the random intercept in the
regression model given by It is ignored if the argument The item  
prior.beta 
prior specification for the regression parameters,
in the case of doublycensored data for the regression parameters of
the onset time, i.e. it is related to This should be a list with the following components:
It is recommended to run the function
bayessurvreg3 first with its argument  
init 
an optional list with initial values for the MCMC related
to the model given by
 
mcmc.par 
a list specifying how some of the Gspline parameters
related to the distribution of the error term and of the random
intercept from Compared to the mcmc.par argument of the function
In contrast to  
prior2 
a list specifying the prior distribution of the Gspline
defining the distribution of the error term in the regression model
given by  
prior.b2 
prior specification for the parameters related to the
random effects from It is ignored if the argument  
prior.beta2 
prior specification for the regression parameters
of timetoevent in the case of doubly censored data (related to
 
init2 
an optional list with initial values for the MCMC related
to the model given by  
mcmc.par2 
a list specifying how some of the Gspline parameters
related to  
priorinit.Nb 
a list specifying the prior of the random intercepts in the case of the AFT model with doublyintervalcensored data and onset, timetoevent random intercepts following bivariate normal distribution. The list should have the following components.
 
rho 
a list specifying possible correlation between the onset
random intercept and the timetoevent random intercept in the
experimental version of the model. If not given correlation is fixed
to It is ignored if the argument The list can have the following components.
 
store 
a list of logical values specifying which chains that are not stored by default are to be stored. The list can have the following components.
 
dir 
a string that specifies a directory where all sampled values are to be stored. 
Value
A list of class bayessurvreg3
containing an information
concerning the initial values and prior choices.
Files created
Additionally, the following files with sampled values
are stored in a directory specified by dir
argument of this
function (some of them are created only on request, see store
parameter of this function).
Headers are written to all files created by default and to files asked
by the user via the argument store
. During the burnin, only
every nsimul$nwrite
value is written. After the burnin, all
sampled values are written in files created by default and to files
asked by the user via the argument store
. In the files for
which the corresponding store
component is FALSE
, every
nsimul$nwrite
value is written during the whole MCMC (this
might be useful to restart the MCMC from some specific point).
The following files are created:
 iteration.sim
one column labeled
iteration
with indeces of MCMC iterations to which the stored sampled values correspond. mixmoment.sim
this file is related to the density of the error term from the model given by
formula
.Columns labeled
k
,Mean.1
,D.1.1
, wherek = number of mixture components that had probability numerically higher than zero;
Mean.1 =
\mbox{E}(\varepsilon_{i,l})
;D.1.1 =
\mbox{var}(\varepsilon_{i,l})
. mixmoment_b.sim
this file is related to the density of the random intercept from the model given by
formula
andrandom
.The same structure as
mixmoment.sim
. mixmoment_2.sim
in the case of doublycensored data. This file is related to the density of the error term from the model given by
formula2
.The same structure as
mixmoment.sim
. mixmoment_b2.sim
in the case of doublycensored data. This file is related to the density of the random intercept from the model given by
formula2
andrandom2
.The same structure as
mixmoment.sim
. mweight.sim
this file is related to the density of the error term from the model given by
formula
.Sampled mixture weights
w_{k}
of mixture components that had probabilities numerically higher than zero. mweight_b.sim
this file is related to the density of the random intercept from the model given by
formula
andrandom
.The same structure as
mweight.sim
. mweight_2.sim
in the case of doublycensored data. This file is related to the density of the error term from the model given by
formula2
.The same structure as
mweight.sim
. mweight_b2.sim
in the case of doublycensored data. This file is related to the density of the random intercept from the model given by
formula2
andrandom2
.The same structure as
mweight.sim
. mmean.sim
this file is related to the density of the error term from the model given by
formula
.Indeces
k,
k \in\{K, \dots, K\}
of mixture components that had probabilities numerically higher than zero. It corresponds to the weights inmweight.sim
. mmean_b.sim
this file is related to the density of the random intercept from the model given by
formula
andrandom
.The same structure as
mmean.sim
. mmean_2.sim
in the case of doublycensored data. This file is related to the density of the error term from the model given by
formula2
.The same structure as
mmean.sim
. mmean_b2.sim
in the case of doublycensored data. This file is related to the density of the random intercept from the model given by
formula2
andrandom2
.The same structure as
mmean.sim
. gspline.sim
this file is related to the density of the error term from the model given by
formula
.Characteristics of the sampled Gspline. This file together with
mixmoment.sim
,mweight.sim
andmmean.sim
can be used to reconstruct the Gspline in each MCMC iteration.The file has columns labeled
gamma1
,sigma1
,delta1
,intercept1
,scale1
, The meaning of the values in these columns is the following:gamma1 = the middle knot
\gamma
If ‘Specification’ is 2, this column usually contains zeros;sigma1 = basis standard deviation
\sigma
of the Gspline. This column contains a fixed value if ‘Specification’ is 2;delta1 = distance
delta
between the two knots of the Gspline. This column contains a fixed value if ‘Specification’ is 2;intercept1 = the intercept term
\alpha
of the Gspline. If ‘Specification’ is 1, this column usually contains zeros;scale1 = the scale parameter
\tau
of the Gspline. If ‘Specification’ is 1, this column usually contains ones; gspline_b.sim
this file is related to the density of the random intercept from the model given by
formula
andrandom
.The same structure as
gspline.sim
. gspline_2.sim
in the case of doublycensored data. This file is related to the density of the error term from the model given by
formula2
.The same structure as
gspline.sim
. gspline_b2.sim
in the case of doublycensored data. This file is related to the density of the random intercept from the model given by
formula2
andrandom2
.The same structure as
gspline.sim
. mlogweight.sim
this file is related to the density of the error term from the model given by
formula
.Fully created only if
store$a = TRUE
. The file contains the transformed weightsa_{k},
k=K,\dots,K
of all mixture components, i.e. also of components that had numerically zero probabilities. mlogweight_b.sim
this file is related to the density of the random intercept from the model given by
formula
andrandom
.Fully created only if
store$a.b = TRUE
.The same structure as
mlogweight.sim
. mlogweight_2.sim
in the case of doublycensored data. This file is related to the density of the error term from the model given by
formula2
.Fully created only if
store$a2 = TRUE
.The same structure as
mlogweight.sim
. mlogweight_b2.sim
in the case of doublycensored data. This file is related to the density of the random intercept from the model given by
formula2
andrandom2
.Fully created only if
store$a.b2 = TRUE
.The same structure as
mlogweight.sim
. r.sim
this file is related to the density of the error term from the model given by
formula
.Fully created only if
store$r = TRUE
. The file contains the labels of the mixture components into which the residuals are intrinsically assigned. Instead of indeces on the scale\{K,\dots, K\}
values from 1 to(2\,K+1)
are stored here. Functionvecr2matr
can be used to transform it back to indices fromK
toK
. r_b.sim
this file is related to the density of the random intercept from the model given by
formula
andrandom
.Fully created only if
store$r.b = TRUE
.The same structure as
r.sim
. r_2.sim
in the case of doublycensored data. This file is related to the density of the error term from the model given by
formula2
.Fully created only if
store$r2 = TRUE
.The same structure as
r.sim
. r_b2.sim
in the case of doublycensored data. This file is related to the density of the random intercept from the model given by
formula2
andrandom2
.Fully created only if
store$r.b2 = TRUE
.The same structure as
r.sim
. lambda.sim
this file is related to the density of the error term from the model given by
formula
.One column labeled
lambda
. These are the values of the smoothing parameter\lambda
(hyperparameters of the prior distribution of the transformed mixture weightsa_{k}
). lambda_b.sim
this file is related to the density of the random intercept from the model given by
formula
andrandom
.The same structure as
lambda.sim
. lambda_2.sim
in the case of doublycensored data. This file is related to the density of the error term from the model given by
formula2
.The same structure as
lambda.sim
. lambda_b2.sim
in the case of doublycensored data. This file is related to the density of the random intercept from the model given by
formula2
andrandom2
.The same structure as
lambda.sim
. beta.sim
this file is related to the model given by
formula
.Sampled values of the regression parameters
\beta
.The columns are labeled according to the
colnames
of the design matrix. beta_2.sim
in the case of doublycensored data, the same structure as
beta.sim
, however related to the model given byformula2
. b.sim
this file is related to the model given by
formula
andrandom
.Fully created only if
store$b = TRUE
. It contains sampled values of random intercepts for all clusters in the data set. The file hasN
columns. b_2.sim
fully created only if
store$b2 = TRUE
and in the case of doublycensored data, the same structure asb.sim
, however related to the model given byformula2
andrandom2
. Y.sim
this file is related to the model given by
formula
.Fully created only if
store$y = TRUE
. It contains sampled (augmented) logevent times for all observations in the data set. Y_2.sim
fully created only if
store$y2 = TRUE
and in the case of doublycensored data, the same structure asY.sim
, however related to the model given byformula2
. logposter.sim

This file is related to the residuals of the model given by
formula
.Columns labeled
loglik
,penalty
, andlogprw
. The columns have the following meaning.loglik
=
%  (\sum_{i=1}^N\,n_i)\,\Bigl\{\log(\sqrt{2\pi}) + \log(\sigma) \Bigr\} 0.5\sum_{i=1}^N\sum_{l=1}^{n_i} \Bigl\{ (\sigma^2\,\tau^2)^{1}\; (y_{i,l}  x_{i,l}'\beta  b_i  \alpha  \tau\mu_{r_{i,l}})^2 \Bigr\}
where
y_{i,l}
denotes (augmented) (i,l)th true logevent time.In other words,
loglik
is equal to the conditional logdensity\sum_{i=1}^N \sum_{l=1}^{n_i}\,\log\Bigl\{p\bigl(y_{i,l}\;\big\;r_{i,l},\,\beta,\,b_i,\,\mbox{errorGspline}\bigr)\Bigr\};
penalty: the penalty term
\frac{1}{2}\sum_{k}\Bigl(\Delta\, a_k\Bigr)^2
(not multiplied by
\lambda
);logprw
=
2\,(\sum_i n_i)\,\log\bigl\{\sum_{k}a_{k}\bigr\} + \sum_{k}N_{k}\,a_{k},
whereN_{k}
is the number of residuals assigned intrinsincally to thek
th mixture component.In other words,
logprw
is equal to the conditional logdensity\sum_{i=1}^N\sum_{l=1}^{n_i} \log\bigl\{p(r_{i,l}\;\;\mbox{errorGspline weights})\bigr\}.
 logposter_b.sim
This file is related to the random intercepts from the model given by
formula
andrandom
.Columns labeled
loglik
,penalty
, andlogprw
. The columns have the following meaning.loglik
=
%  N\,\Bigl\{\log(\sqrt{2\pi}) + \log(\sigma) \Bigr\} 0.5\sum_{i=1}^N \Bigl\{ (\sigma^2\,\tau^2)^{1}\; (b_i  \alpha  \tau\mu_{r_{i}})^2 \Bigr\}
where
b_{i}
denotes (augmented) ith random intercept.In other words,
loglik
is equal to the conditional logdensity\sum_{i=1}^N \,\log\Bigl\{p\bigl(b_{i}\;\big\;r_{i},\,\mbox{bGspline}\bigr)\Bigr\};
The columns
penalty
andlogprw
have the analogous meaning as in the case of logposter.sim file. logposter_2.sim
in the case of doublycensored data, the same structure as
logposter.sim
, however related to the model given byformula2
. logposter_b2.sim
in the case of doublycensored data, the same structure as
logposter_b.sim
, however related to the model given byformula2
andrandom2
.
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
References
GarcíaZattera, M. J., Jara, A., and Komárek, A. (2016). A flexible AFT model for misclassified clustered intervalcensored data. Biometrics, 72, 473  483.
Komárek, A. (2006). Accelerated Failure Time Models for Multivariate IntervalCensored Data with Flexible Distributional Assumptions. PhD. Thesis, Katholieke Universiteit Leuven, Faculteit Wetenschappen.
Komárek, A. and Lesaffre, E. (2008). Bayesian accelerated failure time model with multivariate doublyintervalcensored data and flexible distributional assumptions. Journal of the American Statistical Association, 103, 523  533.
Robert C. P. and Casella, G. (2004). Monte Carlo Statistical Methods, Second Edition. New York: Springer Science+Business Media.
Examples
## See the description of R commands for
## the cluster specific AFT model
## with the Signal Tandmobiel data,
## analysis described in Komarek and Lesaffre (2007).
##
## R commands available in the documentation
## directory of this package
##  see extandmobCS.R and
## https://www2.karlin.mff.cuni.cz/ komarek/software/bayesSurv/extandmobCS.pdf
##