GLMM_MCMC {mixAK} | R Documentation |
MCMC estimation of a (multivariate) generalized linear mixed model with a normal mixture in the distribution of random effects
Description
This function runs MCMC for a generalized linear mixed model with possibly several response variables and possibly normal mixtures in the distributions of random effects.
Usage
GLMM_MCMC(y, dist = "gaussian", id, x, z, random.intercept,
prior.alpha, init.alpha, init2.alpha,
scale.b, prior.b, init.b, init2.b,
prior.eps, init.eps, init2.eps,
nMCMC = c(burn = 10, keep = 10, thin = 1, info = 10),
tuneMCMC = list(alpha = 1, b = 1),
store = c(b = FALSE), PED = TRUE, keep.chains = TRUE,
dens.zero = 1e-300, parallel = FALSE, cltype, silent = FALSE)
## S3 method for class 'GLMM_MCMC'
print(x, ...)
## S3 method for class 'GLMM_MCMClist'
print(x, ...)
Arguments
y |
vector, matrix or data frame with responses. If If there are several responses specified then continuous responses must be put in the first columns and discrete responses in the subsequent columns. |
dist |
character (vector) which determines distribution (and a link function) for each response variable. Possible values are: “gaussian” for gaussian (normal) distribution (with identity link), “binomial(logit)” for binomial (0/1) distribution with a logit link. “poisson(log)” for Poisson distribution with a log link. Single value is recycled if necessary. |
id |
vector which determines longitudinally or otherwise dependent observations. If not given then it is assumed that there are no clusters and all observations of one response are independent. |
x |
matrix or a list of matrices with covariates (intercept not included) for fixed effects.
If there is more than one response, this must always be a list. Note that intercept in included
in all models. Use a character value “empty” as a component of the list |
z |
matrix or a list of matrices with covariates (intercept not included) for random effects.
If there is more than one response, this must always be a list. Note that random intercept
is specified using the argument REMARK: For a particular response, matrices |
random.intercept |
logical (vector) which determines for which responses random intercept should be included. |
prior.alpha |
a list which specifies prior distribution for fixed
effects (not the means of random effects). The prior distribution is
normal and the user can specify the mean and variances.
The list
|
init.alpha |
a numeric vector with initial values of fixed effects
(not the means of random effects) for the first chain. A sensible value is determined using the
maximum-likelihood fits (using |
init2.alpha |
a numeric vector with initial values of fixed effects for the second chain. |
scale.b |
a list specifying how to scale the random effects during
the MCMC. A sensible value is determined using the
maximum-likelihood fits (using If the user wishes to influence the shift and scale constants, these
are given as components of the list
|
prior.b |
a list which specifies prior distribution for (shifted and scaled) random effects. The prior is in principle a normal mixture (being a simple normal distribution if we restrict the number of mixture components to be equal to one). The list
|
init.b |
a list with initial values of the first chain for parameters related to the distribution of random effects and random effects themselves. Sensible initial values are determined by the function itself and do not have to be given by the user.
|
init2.b |
a list with initial values of the second chain for parameters related to the distribution of random effects and random effects themselves. |
prior.eps |
a list specifying prior distributions for
error terms for continuous responses. The list
|
init.eps |
a list with initial values of the first chain for parameters related to the
distribution of error terms of continuous responses. The list
|
init2.eps |
a list with initial values of the second chain for parameters related to the distribution of error terms of continuous responses. |
nMCMC |
numeric vector of length 4 giving parameters of the MCMC simulation. Its components may be named (ordering is then unimportant) as:
In total |
tuneMCMC |
a list with tuning scale parameters for proposal distribution of fixed and random effects. It is used only when there are some discrete response profiles. The components of the list have the following meaning:
|
store |
logical vector indicating whether the chains of parameters should be stored. Its components may be named (ordering is then unimportant) as:
|
PED |
a logical value which indicates whether the penalized expected deviance (see Plummer, 2008 for more details) is to be computed (which requires two parallel chains). |
keep.chains |
logical. If |
dens.zero |
a small value used instead of zero when computing deviance related quantities. |
parallel |
a logical value which indicates whether parallel
computation (based on a package |
cltype |
optional argument applicable if |
silent |
a logical value indicating whether the information on the MCMC progress is to be supressed. |
... |
additional arguments passed to the default |
Details
See accompanying papers (Komárek et al., 2010, Komárek and Komárková, 2013).
Value
An object of class GLMM_MCMClist
(if PED
argument is
TRUE
) or GLMM_MCMC
(if PED
argument is
FALSE
).
Object of class GLMM_MCMC
Object of class GLMM_MCMC
can have the following
components (some of them may be missing according to the context
of the model):
- iter
index of the last iteration performed.
- nMCMC
used value of the argument
nMCMC
.- dist
a character vector of length R corresponding to the
dist
argument.- R
a two component vector giving the number of continuous responses and the number of discrete responses.
- p
a numeric vector of length R giving the number of non-intercept alpha parameters for each response.
- q
a numeric vector of length R giving the number of non-intercept random effects for each response.
- fixed.intercept
a logical vector of length R which indicates inclusion of fixed intercept for each response.
- random.intercept
a logical vector of length R which indicates inclusion of random intercept for each response.
- lalpha
length of the vector of fixed effects.
- dimb
dimension of the distribution of random effects.
- prior.alpha
a list containing the used value of the argument
prior.alpha
.- prior.b
a list containing the used value of the argument
prior.b
.- prior.eps
a list containing the used value of the argument
prior.eps
.- init.alpha
a numeric vector with the used value of the argument
init.alpha
.- init.b
a list containing the used value of the argument
init.b
.- init.eps
a list containing the used value of the argument
init.eps
.- state.first.alpha
a numeric vector with the first stored (after burn-in) value of fixed effects
\alpha
.- state.last.alpha
a numeric vector with the last sampled value of fixed effects
\alpha
. It can be used as argumentinit.alpha
to restart MCMC.- state.first.b
a list with the first stored (after burn-in) values of parameters related to the distribution of random effects. It has components named
b
,K
,w
,mu
,Sigma
,Li
,Q
,gammaInv
,r
.- state.last.b
a list with the last sampled values of parameters related to the distribution of random effects. It has components named
b
,K
,w
,mu
,Sigma
,Li
,Q
,gammaInv
,r
. It can be used as argumentinit.b
to restart MCMC.- state.first.eps
a list with the first stored (after burn-in) values of parameters related to the distribution of residuals of continuous responses. It has components named
sigma
,gammaInv
.- state.last.eps
a list with the last sampled values of parameters related to the distribution of residuals of continuous responses. It has components named
sigma
,gammaInv
. It can be used as argumentinit.eps
to restart MCMC.- prop.accept.alpha
acceptance proportion from the Metropolis-Hastings algorithm for fixed effects (separately for each response type). Note that the acceptance proportion is equal to one for continuous responses since the Gibbs algorithm is used there.
- prop.accept.b
acceptance proportion from the Metropolis-Hastings algorithm for random effects (separately for each cluster). Note that the acceptance proportion is equal to one for models with continuous responses only since the Gibbs algorithm is used there.
- scale.b
a list containing the used value of the argument
scale.b
.- summ.Deviance
a
data.frame
with posterior summary statistics for the deviance (approximated using the Laplacian approximation) and conditional (given random effects) devience.- summ.alpha
a
data.frame
with posterior summary statistics for fixed effects.- summ.b.Mean
a matrix with posterior summary statistics for means of random effects.
- summ.b.SDCorr
a matrix with posterior summary statistics for standard deviations of random effects and correlations of each pair of random effects.
- summ.sigma_eps
a matrix with posterior summary statistics for standard deviations of the error terms in the (mixed) models of continuous responses.
- poster.comp.prob_u
a matrix which is present in the output object if the number of mixture components in the distribution of random effects is fixed and equal to
K
. In that case,poster.comp.prob_u
is a matrix withK
columns andI
rows (I
is the number of subjects defining the longitudinal profiles or correlated observations) with estimated posterior component probabilities – posterior means of the components of the underlying 0/1 allocation vector.WARNING: By default, the labels of components are based on artificial identifiability constraints based on ordering of the mixture means in the first margin. Very often, such identifiability constraint is not satisfactory!
- poster.comp.prob_b
a matrix which is present in the output object if the number of mixture components in the distribution of random effects is fixed and equal to
K
. In that case,poster.comp.prob_b
is a matrix withK
columns andI
rows (I
is the number of subjects defining the longitudinal profiles or correlated observations) with estimated posterior component probabilities – posterior mean over model parameters including random effects.WARNING: By default, the labels of components are based on artificial identifiability constraints based on ordering of the mixture means in the first margin. Very often, such identifiability constraint is not satisfactory!
- freqK_b
frequency table for the MCMC sample of the number of mixture components in the distribution of the random effects.
- propK_b
posterior probabilities for the numbers of mixture components in the distribution of random effects.
- poster.mean.y
a list with
data.frame
s, onedata.frame
per response profile. Eachdata.frame
with columns labeledid
,observed
,fitted
,stres
,eta.fixed
andeta.random
holding identifier for clusters of grouped observations, observed values and posterior means for fitted values (response expectation given fixed and random effects), standardized residuals (derived from fitted values), fixed effect part of the linear predictor and the random effect part of the linear predictor. In each column, there are first all values for the first response, then all values for the second response etc.- poster.mean.profile
a
data.frame
with columns labeledb1
, ...,bq
,Logpb
,Cond.Deviance
,Deviance
with posterior means of random effects for each cluster, posterior means of\log\bigl\{p(\boldsymbol{b})\bigr\}
, conditional deviances, i.e., minus twice the conditional (given random effects) log-likelihood for each cluster and GLMM deviances, i.e., minus twice the marginal (random effects integrated out) log-likelihoods for each cluster. The value of the marginal (random effects integrated out) log-likelihood at each MCMC iteration is obtained using the Laplacian approximation.- poster.mean.w_b
a numeric vector with posterior means of mixture weights after re-labeling. It is computed only if
K_b
is fixed and even then I am not convinced that these are useful posterior summary statistics (see label switching problem mentioned above). In any case, they should be used with care.- poster.mean.mu_b
a matrix with posterior means of mixture means after re-labeling. It is computed only if
K_b
is fixed and even then I am not convinced that these are useful posterior summary statistics (see label switching problem mentioned above). In any case, they should be used with care.- poster.mean.Q_b
a list with posterior means of mixture inverse variances after re-labeling. It is computed only if
K_b
is fixed and even then I am not convinced that these are useful posterior summary statistics (see label switching problem mentioned above). In any case, they should be used with care.- poster.mean.Sigma_b
a list with posterior means of mixture variances after re-labeling. It is computed only if
K_b
is fixed and even then I am not convinced that these are useful posterior summary statistics (see label switching problem mentioned above). In any case, they should be used with care.- poster.mean.Li_b
a list with posterior means of Cholesky decompositions of mixture inverse variances after re-labeling. It is computed only if
K_b
is fixed and even then I am not convinced that these are useful posterior summary statistics (see label switching problem mentioned above). In any case, they should be used with care.- Deviance
numeric vector with a chain for the GLMM deviances, i.e., twice the marginal (random effects integrated out) log-likelihoods of the GLMM. The marginal log-likelihood is obtained using the Laplacian approximation at each iteration of MCMC.
- Cond.Deviance
numeric vector with a chain for the conditional deviances, i.e., twice the conditional (given random effects) log-likelihoods.
- K_b
numeric vector with a chain for
K_b
(number of mixture components in the distribution of random effects).- w_b
numeric vector or matrix with a chain for
w_b
(mixture weights for the distribution of random effects). It is a matrix withK_b
columns whenK_b
is fixed. Otherwise, it is a vector with weights put sequentially after each other.- mu_b
numeric vector or matrix with a chain for
\mu_b
(mixture means for the distribution of random effects). It is a matrix withdimb\cdot K_b
columns whenK_b
is fixed. Otherwise, it is a vector with means put sequentially after each other.- Q_b
numeric vector or matrix with a chain for lower triangles of
\boldsymbol{Q}_b
(mixture inverse variances for the distribution of random effects). It is a matrix with\frac{dimb(dimb+1)}{2}\cdot K_b
columns whenK_b
is fixed. Otherwise, it is a vector with lower triangles of\boldsymbol{Q}_b
matrices put sequentially after each other.- Sigma_b
numeric vector or matrix with a chain for lower triangles of
\Sigma_b
(mixture variances for the distribution of random effects). It is a matrix with\frac{dimb(dimb+1)}{2}\cdot K_b
columns whenK_b
is fixed. Otherwise, it is a vector with lower triangles of\Sigma_b
matrices put sequentially after each other.- Li_b
numeric vector or matrix with a chain for lower triangles of Cholesky decompositions of
\boldsymbol{Q}_b
matrices. It is a matrix with\frac{dimb(dimb+1)}{2}\cdot K_b
columns whenK_b
is fixed. Otherwise, it is a vector with lower triangles put sequentially after each other.- gammaInv_b
matrix with
dimb
columns with a chain for inverses of the hyperparameter\boldsymbol{\gamma}_b
.- order_b
numeric vector or matrix with order indeces of mixture components in the distribution of random effects related to artificial identifiability constraint defined by ordering of the first component of the mixture means.
It is a matrix with
K_b
columns whenK_b
is fixed. Otherwise it is a vector with orders put sequentially after each other.- rank_b
numeric vector or matrix with rank indeces of mixture components in the distribution of random effects related to artificial identifiability constraint defined by ordering of the first component of the mixture means.
It is a matrix with
K_b
columns whenK_b
is fixed. Otherwise it is a vector with ranks put sequentially after each other.- mixture_b
data.frame
with columns labeledb.Mean.*
,b.SD.*
,b.Corr.*.*
containing the chains for the means, standard deviations and correlations of the distribution of the random effects based on a normal mixture at each iteration.- b
a matrix with the MCMC chains for random effects. It is included only if
store[b]
isTRUE
.- alpha
numeric vector or matrix with the MCMC chain(s) for fixed effects.
- sigma_eps
numeric vector or matrix with the MCMC chain(s) for standard deviations of the error terms in the (mixed) models for continuous responses.
- gammaInv_eps
matrix with
dimb
columns with MCMC chain(s) for inverses of the hyperparameter\boldsymbol{\gamma}_b
.- relabel_b
a list which specifies the algorithm used to re-label the MCMC output to compute
order_b
,rank_b
,poster.comp.prob_u
,poster.comp.prob_b
,poster.mean.w_b
,poster.mean.mu_b
,poster.mean.Q_b
,poster.mean.Sigma_b
,poster.mean.Li_b
.- Cpar
a list with components useful to call underlying C++ functions (not interesting for ordinary users).
Object of class GLMM_MCMClist
Object of class NMixMCMClist
is the list having two components
of class NMixMCMC
representing two parallel chains and
additionally the following components:
- PED
values of penalized expected deviance and related quantities. It is a vector with five components:
D.expect
=
estimated expected deviance, where the estimate is based on two parallel chains;popt
=
estimated penalty, where the estimate is based on simple MCMC average based on two parallel chains;PED
=
estimated penalized expected deviance=
D.expect
+
popt
;wpopt
=
estimated penalty, where the estimate is based on weighted MCMC average (through importance sampling) based on two parallel chains;wPED
=
estimated penalized expected deviance=
D.expect
+
wpopt
.- D
posterior mean of the deviance for each subject.
- popt
contributions to the unweighted penalty from each subject.
- wpopt
contributions to the weighted penalty from each subject.
- inv.D
for each subject, number of iterations (in both chains), where the deviance was in fact equal to infinity (when the corresponding density was lower than
dens.zero
) and was not taken into account when computingD.expect
.- inv.popt
for each subject, number of iterations, where the penalty was in fact equal to infinity and was not taken into account when computing
popt
.- inv.wpopt
for each subject, number of iterations, where the importance sampling weight was in fact equal to infinity and was not taken into account when computing
wpopt
.- sumISw
for each subject, sum of importance sampling weights.
- Deviance1
sampled value of the observed data deviance from chain 1
- Deviance2
sampled values of the obserbed data deviance from chain 2
- Deviance_repl1_ch1
sampled values of the deviance of data replicated according to the chain 1 evaluated under the parameters from chain 1
- Deviance_repl1_ch2
sampled values of the deviance of data replicated according to the chain 1 evaluated under the parameters from chain 2
- Deviance_repl2_ch1
sampled values of the deviance of data replicated according to the chain 2 evaluated under the parameters from chain 1
- Deviance_repl2_ch2
sampled values of the deviance of data replicated according to the chain 2 evaluated under the parameters from chain 2
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
References
Komárek, A. and Komárková, L. (2013). Clustering for multivariate continuous and discrete longitudinal data. The Annals of Applied Statistics, 7(1), 177–200.
Komárek, A. and Komárková, L. (2014). Capabilities of R package mixAK for clustering based on multivariate continuous and discrete longitudinal data. Journal of Statistical Software, 59(12), 1–38. doi:10.18637/jss.v059.i12.
Komárek, A., Hansen, B. E., Kuiper, E. M. M., van Buuren, H. R., and Lesaffre, E. (2010). Discriminant analysis using a multivariate linear mixed model with a normal mixture in the random effects distribution. Statistics in Medicine, 29(30), 3267–3283.
Plummer, M. (2008). Penalized loss functions for Bayesian model comparison. Biostatistics, 9(3), 523–539.
See Also
Examples
## See also additional material available in
## YOUR_R_DIR/library/mixAK/doc/
## or YOUR_R_DIR/site-library/mixAK/doc/
## - files http://www.karlin.mff.cuni.cz/~komarek/software/mixAK/PBCseq.pdf,
## PBCseq.R
## ==============================================