bayessurvreg2 {bayesSurv} | R Documentation |
Cluster-specific accelerated failure time model for multivariate, possibly doubly-interval-censored data. The error distribution is expressed as a penalized univariate normal mixture with high number of components (G-spline). The distribution of the vector of random effects is multivariate normal.
Description
A function to estimate a regression model with possibly clustered (possibly right, left, interval or doubly-interval censored) data. In the case of doubly-interval censoring, different regression models can be specified for the onset and event times.
(Multivariate) random effects, normally distributed and acting as in the linear mixed model, normally distributed, can be included to adjust for clusters.
The error density of the regression model is specified as a mixture of Bayesian G-splines (normal densities with equidistant means and constant variances). This function performs an MCMC sampling from the posterior distribution of unknown quantities.
For details, see Komárek (2006), and Komárek, Lesaffre and Legrand (2007).
We explain first in more detail a model without doubly censoring.
Let
be event times for
th cluster and the units within that cluster
The following regression model is assumed:
where is unknown regression parameter vector,
is a vector of covariates.
is a (multivariate) cluster-specific random effect
vector and
is a vector of covariates for random
effects.
The random effect vectors
are assumed to be i.i.d. with a (multivariate) normal distribution
with the mean
and a covariance matrix
. Hierarchical centring (see Gelfand, Sahu, Carlin, 1995) is
used. I.e.
expresses the average effect of the
covariates included in
. Note that covariates
included in
may not be included in the covariate
vector
. The covariance matrix
is
assigned an inverse Wishart prior distribution in the next level of hierarchy.
The error terms
are assumed to be i.i.d. with a univariate density
. This density is expressed as
a mixture of Bayesian G-splines (normal densities with equidistant
means and constant variances). We distinguish two,
theoretically equivalent, specifications.
- Specification 1
-
where
is the unknown basis variance and
is an equidistant grid of knots symmetric around the unknown point
and related to the unknown basis variance through the relationship
where
is fixed constants, e.g.
(which has a justification of being close to cubic B-splines).
- Specification 2
-
where
is an unknown intercept term and
is an unknown scale parameter.
is then standardized error term which is distributed according to the univariate normal mixture, i.e.
where
is an equidistant grid of fixed knots (means), usually symmetric about the fixed point
and
is fixed basis variance. Reasonable values for the numbers of grid points
is
with the distance between the two knots equal to
and for the basis variance
Personally, I found Specification 2 performing better. In the paper Komárek, Lesaffre and Legrand (2007) only Specification 2 is described.
The mixture weights
are
not estimated directly. To avoid the constraints
and
transformed weights
related to the original weights by the logistic transformation:
are estimated instead.
A Bayesian model is set up for all unknown parameters. For more details I refer to Komárek (2006) and to Komárek, Lesafre, and Legrand (2007).
If there are doubly-censored data the model of the same type as above can be specified for both the onset time and the time-to-event.
Usage
bayessurvreg2(formula, random, formula2, random2,
data = parent.frame(),
na.action = na.fail, onlyX = FALSE,
nsimul = list(niter = 10, nthin = 1, nburn = 0, nwrite = 10),
prior, prior.beta, prior.b, init = list(iter = 0),
mcmc.par = list(type.update.a = "slice", k.overrelax.a = 1,
k.overrelax.sigma = 1, k.overrelax.scale = 1),
prior2, prior.beta2, prior.b2, init2,
mcmc.par2 = list(type.update.a = "slice", k.overrelax.a = 1,
k.overrelax.sigma = 1, k.overrelax.scale = 1),
store = list(a = FALSE, a2 = FALSE, y = FALSE, y2 = FALSE,
r = FALSE, r2 = FALSE, b = FALSE, b2 = FALSE),
dir)
Arguments
formula |
model formula for the regression. In the case of doubly-censored data, this is the model formula for the onset time. The left-hand side of the In the formula all covariates appearing both in the vector
If
| |
random |
formula for the ‘random’ part of the model, i.e. the
part that specifies the covariates If omitted, no random part is included in the model. E.g. to specify the model with a
random intercept, say When some random effects are included the random intercept is added
by default. It can be removed using e.g. | |
formula2 |
model formula for the regression of the time-to-event in
the case of doubly-censored data. Ignored otherwise. The same structure as
for | |
random2 |
specification of the ‘random’ part of the model for
time-to-event in the case of doubly-censored data. Ignored
otherwise. The same structure as for | |
data |
optional data frame in which to interpret the variables
occuring in the | |
na.action |
the user is discouraged from changing the default
value | |
onlyX |
if | |
nsimul |
a list giving the number of iterations of the MCMC and other parameters of the simulation.
| |
prior |
a list specifying the prior distribution of the G-spline
defining the distribution of the error term in the regression model
given by The item | |
prior.b |
a list defining the way in which the random effects
involved in
| |
prior.beta |
prior specification for the regression parameters,
in the case of doubly-censored data for the regression parameters of
the onset time, i.e. it is related to This should be a list with the following components:
It is recommended to run the function
bayessurvreg2 first with its argument | |
init |
an optional list with initial values for the MCMC related
to the model given by
| |
mcmc.par |
a list specifying how some of the G-spline parameters
related to the distribution of the error term from In contrast to | |
prior2 |
a list specifying the prior distribution of the G-spline
defining the distribution of the error term in the regression model
given by | |
prior.b2 |
prior specification for the parameters related to the
random effects from | |
prior.beta2 |
prior specification for the regression parameters
of time-to-event in the case of doubly censored data (related to
| |
init2 |
an optional list with initial values for the MCMC related
to the model given by | |
mcmc.par2 |
a list specifying how some of the G-spline parameters
related to | |
store |
a list of logical values specifying which chains that are not stored by default are to be stored. The list can have the following components.
| |
dir |
a string that specifies a directory where all sampled values are to be stored. |
Value
A list of class bayessurvreg2
containing an information
concerning the initial values and prior choices.
Files created
Additionally, the following files with sampled values
are stored in a directory specified by dir
argument of this
function (some of them are created only on request, see store
parameter of this function).
Headers are written to all files created by default and to files asked
by the user via the argument store
. During the burn-in, only
every nsimul$nwrite
value is written. After the burn-in, all
sampled values are written in files created by default and to files
asked by the user via the argument store
. In the files for
which the corresponding store
component is FALSE
, every
nsimul$nwrite
value is written during the whole MCMC (this
might be useful to restart the MCMC from some specific point).
The following files are created:
- iteration.sim
one column labeled
iteration
with indeces of MCMC iterations to which the stored sampled values correspond.- mixmoment.sim
columns labeled
k
,Mean.1
,D.1.1
, wherek = number of mixture components that had probability numerically higher than zero;
Mean.1 =
;
D.1.1 =
;
all related to the distribution of the error term from the model given by
formula
.- mixmoment_2.sim
in the case of doubly-censored data, the same structure as
mixmoment.sim
, however related to the model given byformula2
.- mweight.sim
sampled mixture weights
of mixture components that had probabilities numerically higher than zero. Related to the model given by
formula
.- mweight_2.sim
in the case of doubly-censored data, the same structure as
mweight.sim
, however related to the model given byformula2
.- mmean.sim
indeces
of mixture components that had probabilities numerically higher than zero. It corresponds to the weights in
mweight.sim
. Related to the model given byformula
.- mmean_2.sim
in the case of doubly-censored data, the same structure as
mmean.sim
, however related to the model given byformula2
.- gspline.sim
characteristics of the sampled G-spline (distribution of
) related to the model given by
formula
. This file together withmixmoment.sim
,mweight.sim
andmmean.sim
can be used to reconstruct the G-spline in each MCMC iteration.The file has columns labeled
gamma1
,sigma1
,delta1
,intercept1
,scale1
, The meaning of the values in these columns is the following:gamma1 = the middle knot
If ‘Specification’ is 2, this column usually contains zeros;
sigma1 = basis standard deviation
of the G-spline. This column contains a fixed value if ‘Specification’ is 2;
delta1 = distance
between the two knots of the G-spline. This column contains a fixed value if ‘Specification’ is 2;
intercept1 = the intercept term
of the G-spline. If ‘Specification’ is 1, this column usually contains zeros;
scale1 = the scale parameter
of the G-spline. If ‘Specification’ is 1, this column usually contains ones;
- gspline_2.sim
in the case of doubly-censored data, the same structure as
gspline.sim
, however related to the model given byformula2
.- mlogweight.sim
fully created only if
store$a = TRUE
. The file contains the transformed weightsof all mixture components, i.e. also of components that had numerically zero probabilities. This file is related to the error distribution of the model given by
formula
.- mlogweight_2.sim
fully created only if
store$a2 = TRUE
and in the case of doubly-censored data, the same structure asmlogweight.sim
, however related to the error distribution of the model given byformula2
.- r.sim
fully created only if
store$r = TRUE
. The file contains the labels of the mixture components into which the residuals are intrinsically assigned. Instead of indeces on the scalevalues from 1 to
are stored here. Function
vecr2matr
can be used to transform it back to indices fromto
.
- r_2.sim
fully created only if
store$r2 = TRUE
and in the case of doubly-censored data, the same structure asr.sim
, however related to the model given byformula2
.- lambda.sim
one column labeled
lambda
. These are the values of the smoothing parameter(hyperparameters of the prior distribution of the transformed mixture weights
). This file is related to the model given by
formula
.- lambda_2.sim
in the case of doubly-censored data, the same structure as
lambda.sim
, however related to the model given byformula2
.- beta.sim
sampled values of the regression parameters, both the fixed effects
and means of the random effects
(except the random intercept which has always the mean equal to zero). This file is related to the model given by
formula
. The columns are labeled according to thecolnames
of the design matrix.- beta_2.sim
in the case of doubly-censored data, the same structure as
beta.sim
, however related to the model given byformula2
.- D.sim
sampled values of the covariance matrix
of the random effects. The file has
columns (
is the dimension of the random effect vector
). The first column labeled
det
contains the determinant of the sampled matrix, additional columns labeledD.1.1
,D.2.1
, ...,D.q.1
, ...D.q.q
contain the lower triangle of the sampled matrix. This file is related to the model specified byformula
andrandom
.- D_2.sim
in the case of doubly-censored data, the same structure as
D.sim
, however related to the model given byformula2
andrandom2
.- b.sim
fully created only if
store$b = TRUE
. It contains sampled values of random effects for all clusters in the data set. The file hascolumns sorted as
. This file is related to the model given by
formula
andrandom
.- b_2.sim
fully created only if
store$b2 = TRUE
and in the case of doubly-censored data, the same structure asb.sim
, however related to the model given byformula2
andrandom2
.- Y.sim
fully created only if
store$y = TRUE
. It contains sampled (augmented) log-event times for all observations in the data set.- Y_2.sim
fully created only if
store$y2 = TRUE
and in the case of doubly-censored data, the same structure asY.sim
, however related to the model given byformula2
.- logposter.sim
columns labeled
loglik
,penalty
, andlogprw
. This file is related to the model given byformula
. The columns have the following meaning.loglik
where
denotes (augmented) (i,l)th true log-event time.
In other words,
loglik
is equal to the conditional log-densitypenalty: the penalty term
(not multiplied by
);
logprw
where
is the number of residuals assigned intrinsincally to the
th mixture component.
In other words,
logprw
is equal to the conditional log-density- logposter_2.sim
in the case of doubly-censored data, the same structure as
logposter.sim
, however related to the model given byformula2
.
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
References
Gelfand, A. E., Sahu, S. K., and Carlin, B. P. (1995). Efficient parametrisations for normal linear mixed models. Biometrika, 82, 479-488.
Komárek, A. (2006). Accelerated Failure Time Models for Multivariate Interval-Censored Data with Flexible Distributional Assumptions. PhD. Thesis, Katholieke Universiteit Leuven, Faculteit Wetenschappen.
Komárek, A., Lesaffre, E., and Legrand, C. (2007). Baseline and treatment effect heterogeneity for survival times between centers using a random effects accelerated failure time model with flexible error distribution. Statistics in Medicine, 26, 5457-5472.
Examples
## See the description of R commands for
## the model with EORTC data,
## analysis described in Komarek, Lesaffre and Legrand (2007).
##
## R commands available in the documentation
## directory of this package
## as ex-eortc.R and
## https://www2.karlin.mff.cuni.cz/ komarek/software/bayesSurv/ex-eortc.pdf
##