bayesBisurvreg {bayesSurv}  R Documentation 
Populationaveraged accelerated failure time model for bivariate, possibly doublyintervalcensored data. The error distribution is expressed as a penalized bivariate normal mixture with high number of components (bivariate Gspline).
Description
A function to estimate a regression model with bivariate (possibly right, left, interval or doublyintervalcensored) data. In the case of doubly interval censoring, different regression models can be specified for the onset and event times.
The error density of the regression model is specified as a mixture of Bayesian Gsplines (normal densities with equidistant means and constant variance matrices). This function performs an MCMC sampling from the posterior distribution of unknown quantities.
For details, see Komárek (2006) and Komárek and Lesaffre (2006).
We explain first in more detail a model without doubly censoring.
Let T_{i,l},\; i=1,\dots, N,\; l=1, 2
be event times for i
th cluster and the first and the second
unit. The following regression model is assumed:
\log(T_{i,l}) = \beta'x_{i,l} + \varepsilon_{i,l},\quad i=1,\dots,N,\;l=1,2
where \beta
is unknown regression parameter vector and
x_{i,l}
is a vector of covariates. The bivariate error terms
\varepsilon_i=(\varepsilon_{i,1},\,\varepsilon_{i,2})',\;i=1,\dots,N
are assumed to be i.i.d. with a bivariate density
g_{\varepsilon}(e_1,\,e_2)
. This density is expressed as
a mixture of Bayesian Gsplines (normal densities with equidistant
means and constant variance matrices). We distinguish two,
theoretically equivalent, specifications.
 Specification 1

(\varepsilon_1,\,\varepsilon_2)' \sim \sum_{j_1=K_1}^{K_1}\sum_{j_2=K_2}^{K_2} w_{j_1,j_2} N_2(\mu_{(j_1,j_2)},\,\mbox{diag}(\sigma_1^2,\,\sigma_2^2))
where
\sigma_1^2,\,\sigma_2^2
are unknown basis variances and\mu_{(j_1,j_2)} = (\mu_{1,j_1},\,\mu_{2,j_2})'
is an equidistant grid of knots symmetric around the unknown point(\gamma_1,\,\gamma_2)'
and related to the unknown basis variances through the relationship\mu_{1,j_1} = \gamma_1 + j_1\delta_1\sigma_1,\quad j_1=K_1,\dots,K_1,
\mu_{2,j_2} = \gamma_2 + j_2\delta_2\sigma_2,\quad j_2=K_2,\dots,K_2,
where
\delta_1,\,\delta_2
are fixed constants, e.g.\delta_1=\delta_2=2/3
(which has a justification of being close to cubic Bsplines).  Specification 2

(\varepsilon_1,\,\varepsilon_2)' \sim (\alpha_1,\,\alpha_2)'+ \bold{S}\,(V_1,\,V_2)'
where
(\alpha_1,\,\alpha_2)'
is an unknown intercept term and\bold{S} \mbox{ is a diagonal matrix with } \tau_1 \mbox{ and }\tau_2 \mbox{ on a diagonal,}
i.e.\tau_1,\,\tau_2
are unknown scale parameters.(V_1,\,V_2)')
is then standardized bivariate error term which is distributed according to the bivariate normal mixture, i.e.(V_1,\,V_2)'\sim \sum_{j_1=K_1}^{K_1}\sum_{j_2=K_2}^{K_2} w_{j_1,j_2} N_2(\mu_{(j_1,j_2)},\,\mbox{diag}(\sigma_1^2, \sigma_2^2))
where
\mu_{(j_1,j_2)} = (\mu_{1,j_1},\,\mu_{2,j_2})'
is an equidistant grid of fixed knots (means), usually symmetric about the fixed point(\gamma_1,\,\gamma_2)'=(0, 0)'
and\sigma_1^2,\,\sigma_2^2
are fixed basis variances. Reasonable values for the numbers of grid pointsK_1
andK_2
areK_1=K_2=15
with the distance between the two knots equal to\delta=0.3
and for the basis variances\sigma_1^2\sigma_2^2=0.2^2.
Personally, I found Specification 2 performing better. In the paper Komárek and Lesaffre (2006) only Specification 2 is described.
The mixture weights
w_{j_1,j_2},\;j_1=K_1,\dots, K_1,\;j_2=K_2,\dots,
K_2
are
not estimated directly. To avoid the constraints
0 < w_{j_1,j_2} < 1
and
\sum_{j_1=K_1}^{K_1}\sum_{j_2=K_2}^{K_2}w_{j_1,j_2} =
1
transformed weights a_{j_1,j_2},\;j_1=K_1,\dots, K_1,\;j_2=K_2,\dots,
K_2
related to the original weights by the logistic transformation:
a_{j_1,j_2} =
\frac{\exp(w_{j_1,j_2})}{\sum_{m_1}\sum_{m_2}\exp(w_{m_1,m_2})}
are estimated instead.
A Bayesian model is set up for all unknown parameters. For more details I refer to Komárek and Lesaffre (2006) and to Komárek (2006).
If there are doublycensored data the model of the same type as above can be specified for both the onset time and the timetoevent.
Usage
bayesBisurvreg(formula, formula2, data = parent.frame(),
na.action = na.fail, onlyX = FALSE,
nsimul = list(niter = 10, nthin = 1, nburn = 0, nwrite = 10),
prior, prior.beta, init = list(iter = 0),
mcmc.par = list(type.update.a = "slice", k.overrelax.a = 1,
k.overrelax.sigma = 1, k.overrelax.scale = 1),
prior2, prior.beta2, init2,
mcmc.par2 = list(type.update.a = "slice", k.overrelax.a = 1,
k.overrelax.sigma = 1, k.overrelax.scale = 1),
store = list(a = FALSE, a2 = FALSE, y = FALSE, y2 = FALSE,
r = FALSE, r2 = FALSE),
dir)
Arguments
formula 
model formula for the regression. In the case of
doublycensored data, this is the model formula for the onset
time. Data are assumed to be sorted according to subjects and within
subjects according to the types of the events that determine the
bivariate survival distribution, i.e. the response vector must be
The lefthand side of the formula must be an object created using

formula2 
model formula for the regression of the timetoevent in
the case of doublycensored data. Ignored otherwise. The same remark as
for 
data 
optional data frame in which to interpret the variables occuring in the formulas. 
na.action 
the user is discouraged from changing the default
value 
onlyX 
if 
nsimul 
a list giving the number of iterations of the MCMC and other parameters of the simulation.

prior 
a list specifying the prior distribution of the Gspline
defining the distribution of the error term in the regression model
given by 
prior.beta 
prior specification for the regression parameters,
in the case of doubly censored data for the regression parameters of
the onset time. I.e. it is related to This should be a list with the following components:
It is recommended to run the function
bayesBisurvreg first with its argument 
init 
an optional list with initial values for the MCMC related
to the model given by
and

mcmc.par 
a list specifying how some of the Gspline parameters
related to

prior2 
a list specifying the prior distribution of the Gspline
defining the distribution of the error term in the regression model
given by 
prior.beta2 
prior specification for the regression parameters
of timetoevent in the case of doubly censored data (related to

init2 
an optional list with initial values for the MCMC related
to the model given by 
mcmc.par2 
a list specifying how some of the Gspline parameters
related to 
store 
a list of logical values specifying which chains that are not stored by default are to be stored. The list can have the following components.

dir 
a string that specifies a directory where all sampled values are to be stored. 
Value
A list of class bayesBisurvreg
containing an information
concerning the initial values and prior choices.
Files created
Additionally, the following files with sampled values
are stored in a directory specified by dir
argument of this
function (some of them are created only on request, see store
parameter of this function).
Headers are written to all files created by default and to files asked
by the user via the argument store
. During the burnin, only
every nsimul$nwrite
value is written. After the burnin, all
sampled values are written in files created by default and to files
asked by the user via the argument store
. In the files for
which the corresponding store
component is FALSE
, every
nsimul$nwrite
value is written during the whole MCMC (this
might be useful to restart the MCMC from some specific point).
The following files are created:
 iteration.sim
one column labeled
iteration
with indeces of MCMC iterations to which the stored sampled values correspond. mixmoment.sim
columns labeled
k
,Mean.1
,Mean.2
,D.1.1
,D.2.1
,D.2.2
, wherek = number of mixture components that had probability numerically higher than zero;
Mean.1 =
\mbox{E}(\varepsilon_{i,1})
;Mean.2 =
\mbox{E}(\varepsilon_{i,2})
;D.1.1 =
\mbox{var}(\varepsilon_{i,1})
;D.2.1 =
\mbox{cov}(\varepsilon_{i,1},\,\varepsilon_{i,2})
;D.2.2 =
\mbox{var}(\varepsilon_{i,2})
;all related to the distribution of the error term from the model given by
formula
. mixmoment_2.sim
in the case of doublycensored data, the same structure as
mixmoment.sim
, however related to the model given byformula2
. mweight.sim
sampled mixture weights
w_{k_1,\,k_2}
of mixture components that had probabilities numerically higher than zero. Related to the model given byformula
. mweight_2.sim
in the case of doublycensored data, the same structure as
mweight.sim
, however related to the model given byformula2
. mmean.sim
indeces
k_1,\;k_2,
k_1 \in\{K_1, \dots, K_1\},
k_2 \in\{K_2, \dots, K_2\}
of mixture components that had probabilities numerically higher than zero. It corresponds to the weights inmweight.sim
. Related to the model given byformula
. mmean_2.sim
in the case of doublycensored data, the same structure as
mmean.sim
, however related to the model given byformula2
. gspline.sim
characteristics of the sampled Gspline (distribution of
(\varepsilon_{i,1},\,\varepsilon_{i,2})'
) related to the model given byformula
. This file together withmixmoment.sim
,mweight.sim
andmmean.sim
can be used to reconstruct the Gspline in each MCMC iteration.The file has columns labeled
gamma1
,gamma2
,sigma1
,sigma2
,delta1
,delta2
,intercept1
,intercept2
,scale1
,scale2
. The meaning of the values in these columns is the following:gamma1 = the middle knot
\gamma_1
in the first dimension. If ‘Specification’ is 2, this column usually contains zeros;gamma2 = the middle knot
\gamma_2
in the second dimension. If ‘Specification’ is 2, this column usually contains zeros;sigma1 = basis standard deviation
\sigma_1
of the Gspline in the first dimension. This column contains a fixed value if ‘Specification’ is 2;sigma2 = basis standard deviation
\sigma_2
of the Gspline in the second dimension. This column contains a fixed value if ‘Specification’ is 2;delta1 = distance
delta_1
between the two knots of the Gspline in the first dimension. This column contains a fixed value if ‘Specification’ is 2;delta2 = distance
\delta_2
between the two knots of the Gspline in the second dimension. This column contains a fixed value if ‘Specification’ is 2;intercept1 = the intercept term
\alpha_1
of the Gspline in the first dimension. If ‘Specification’ is 1, this column usually contains zeros;intercept2 = the intercept term
\alpha_2
of the Gspline in the second dimension. If ‘Specification’ is 1, this column usually contains zeros;scale1 = the scale parameter
\tau_1
of the Gspline in the first dimension. If ‘Specification’ is 1, this column usually contains ones;scale2 = the scale parameter
\tau_2
of the Gspline in the second dimension. ‘Specification’ is 1, this column usually contains ones. gspline_2.sim
in the case of doublycensored data, the same structure as
gspline.sim
, however related to the model given byformula2
. mlogweight.sim
fully created only if
store$a = TRUE
. The file contains the transformed weightsa_{k_1,\,k_2},
k_1=K_1,\dots,K_1,
k_2=K_2,\dots,K_2
of all mixture components, i.e. also of components that had numerically zero probabilities. This file is related to the model given byformula
. mlogweight_2.sim
fully created only if
store$a2 = TRUE
and in the case of doublycensored data, the same structure asmlogweight.sim
, however related to the model given byformula2
. r.sim
fully created only if
store$r = TRUE
. The file contains the labels of the mixture components into which the residuals are intrinsically assigned. Instead of double indeces(k_1,\,k_2)
, values from 1 to(2\,K_1+1)\times (2\,K_2+1)
are stored here. Functionvecr2matr
can be used to transform it back to double indeces. r_2.sim
fully created only if
store$r2 = TRUE
and in the case of doublycensored data, the same structure asr.sim
, however related to the model given byformula2
. lambda.sim
either one column labeled
lambda
or two columns labeledlambda1
andlambda2
. These are the values of the smoothing parameter(s)\lambda
(hyperparameters of the prior distribution of the transformed mixture weightsa_{k_1,\,k_2}
). This file is related to the model given byformula
. lambda_2.sim
in the case of doublycensored data, the same structure as
lambda.sim
, however related to the model given byformula2
. beta.sim
sampled values of the regression parameters
\beta
related to the model given byformula
. The columns are labeled according to thecolnames
of the design matrix. beta_2.sim
in the case of doublycensored data, the same structure as
beta.sim
, however related to the model given byformula2
. Y.sim
fully created only if
store$y = TRUE
. It contains sampled (augmented) logevent times for all observations in the data set. Y_2.sim
fully created only if
store$y2 = TRUE
and in the case of doublycensored data, the same structure asY.sim
, however related to the model given byformula2
. logposter.sim
columns labeled
loglik
,penalty
orpenalty1
andpenalty2
,logprw
. This file is related to the model given byformula
. The columns have the following meaning.loglik
=
% N\Bigl\{\log(2\pi) + \log(\sigma_1) + \log(\sigma_2)\Bigr\} 0.5\sum_{i=1}^N\Bigl\{ (\sigma_1^2\,\tau_1^2)^{1}\; (y_{i,1}  x_{i,1}'\beta  \alpha_1  \tau_1\mu_{1,\,r_{i,1}})^2 + (\sigma_2^2\,\tau_2^2)^{1}\; (y_{i,2}  x_{i,2}'\beta  \alpha_2  \tau_2\mu_{2,\,r_{i,2}})^2 \Bigr\}
where
y_{i,l}
denotes (augmented) (i,l)th true logevent time. In other words,loglik
is equal to the conditional logdensity\sum_{i=1}^N\,\log\Bigl\{p\bigl((y_{i,1},\,y_{i,2})\;\big\;r_{i},\,\beta,\,\mbox{Gspline}\bigr)\Bigr\};
penalty1: If
prior$neighbor.system
="uniCAR"
: the penalty term for the first dimension not multiplied bylambda1
;penalty2: If
prior$neighbor.system
="uniCAR"
: the penalty term for the second dimension not multiplied bylambda2
;penalty: If
prior$neighbor.system
is different from"uniCAR"
: the penalty term not multiplied bylambda
;logprw
=
2\,N\,\log\bigl\{\sum_{k_1}\sum_{k_2}a_{k_1,\,k_2}\bigr\} + \sum_{k_1}\sum_{k_2}N_{k_1,\,k_2}\,a_{k_1,\,k_2},
whereN_{k_1,\,k_2}
is the number of residuals assigned intrinsincally to the(k_1,\,k_2)
th mixture component.In other words,
logprw
is equal to the conditional logdensity\sum_{i=1}^N \log\bigl\{p(r_i\;\;\mbox{Gspline weights})\bigr\}.
 logposter_2.sim
in the case of doublycensored data, the same structure as
lambda.sim
, however related to the model given byformula2
.
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
References
Gilks, W. R. and Wild, P. (1992). Adaptive rejection sampling for Gibbs sampling. Applied Statistics, 41, 337  348.
Komárek, A. (2006). Accelerated Failure Time Models for Multivariate IntervalCensored Data with Flexible Distributional Assumptions. PhD. Thesis, Katholieke Universiteit Leuven, Faculteit Wetenschappen.
Komárek, A. and Lesaffre, E. (2006). Bayesian semiparametric accelerated failure time model for paired doubly intervalcensored data. Statistical Modelling, 6, 3  22.
Neal, R. M. (2003). Slice sampling (with Discussion). The Annals of Statistics, 31, 705  767.
Examples
## See the description of R commands for
## the population averaged AFT model
## with the Signal Tandmobiel data,
## analysis described in Komarek and Lesaffre (2006),
##
## R commands available in the documentation
## directory of this package as
##  see extandmobPA.R and
## https://www2.karlin.mff.cuni.cz/ komarek/software/bayesSurv/extandmobPA.pdf
##