NMixMCMC {mixAK} | R Documentation |
MCMC estimation of (multivariate) normal mixtures with possibly censored data.
Description
This function runs MCMC for a model in which unknown density is specified as a normal mixture with either known or unknown number of components. With a prespecified number of components, MCMC is implemented through Gibbs sampling (see Diebolt and Robert, 1994) and dimension of the data can be arbitrary. With unknown number of components, currently only univariate case is implemented using the reversible jump MCMC (Richardson and Green, 1997).
Further, the data are allowed to be censored in which case additional Gibbs step is used within the MCMC algorithm
Usage
NMixMCMC(y0, y1, censor, x_w, scale, prior,
init, init2, RJMCMC,
nMCMC = c(burn = 10, keep = 10, thin = 1, info = 10),
PED, keep.chains = TRUE, onlyInit = FALSE, dens.zero = 1e-300,
parallel = FALSE, cltype)
## S3 method for class 'NMixMCMC'
print(x, dic, ...)
## S3 method for class 'NMixMCMClist'
print(x, ped, dic, ...)
Arguments
y0 |
numeric vector of length |
y1 |
numeric vector of length It does not have to be supplied if there are no interval-censored data. |
censor |
numeric vector of length
If it is not supplied then it is assumed that all values are exactly observed. |
x_w |
optional vector providing a categorical covariate that may
influence the mixture weights. Internally, it is converted into a
Added in version 4.0 (03/2015). |
scale |
a list specifying how to scale the data before running MCMC. It should have two components:
If there is no censoring, and argument If you do not wish to scale the data before running MCMC, specify
|
prior |
a list with the parameters of the prior distribution. It should have the following components (for some of them, the program can assign default values and the user does not have to specify them if he/she wishes to use the defaults):
|
init |
a list with the initial values for the MCMC. All initials can be determined by the program if they are not specified. The list may have the following components:
|
init2 |
a list with the initial values for the second chain
needed to estimate the penalized expected deviance of Plummer
(2008). The list Ignored when |
RJMCMC |
a list with the parameters needed to run reversible jump MCMC for mixtures with varying number of components. It does not have to be specified if the number of components is fixed. Most of the parameters can be determined by the program if they are not specified. The list may have the following components:
|
nMCMC |
numeric vector of length 4 giving parameters of the MCMC simulation. Its components may be named (ordering is then unimportant) as:
In total |
PED |
a logical value which indicates whether the penalized
expected deviance (see Plummer, 2008 for more details)
is to be computed (which requires two parallel
chains). If not specified, |
keep.chains |
logical. If |
onlyInit |
logical. If |
dens.zero |
a small value used instead of zero when computing deviance related quantities. |
x |
an object of class |
dic |
logical which indicates whether DIC should be printed. By default, DIC is printed only for models with a fixed number of mixture components. |
ped |
logical which indicates whether PED should be printed. By default, PED is printed only for models with a fixed number of mixture components. |
parallel |
a logical value which indicates whether parallel
computation (based on a package |
cltype |
optional argument applicable if |
... |
additional arguments passed to the default |
Details
See accompanying paper (Komárek, 2009).
In the rest of the helpfile,
the same notation is used as in the paper, namely, n
denotes the number of
observations, p
is dimension of the data, K
is the number
of mixture components,
w_1,\dots,w_K
are mixture weights,
\boldsymbol{\mu}_1,\dots,\boldsymbol{\mu}_K
are mixture means,
\boldsymbol{\Sigma}_1,\dots,\boldsymbol{\Sigma}_K
are mixture variance-covariance matrices,
\boldsymbol{Q}_1,\dots,\boldsymbol{Q}_K
are
their inverses.
For the data
\boldsymbol{y}_1,\dots,\boldsymbol{y}_n
the
following g_y(\boldsymbol{y})
density is assumed
g_y(\boldsymbol{y}) = |\boldsymbol{S}|^{-1} \sum_{j=1}^K w_j
\varphi\bigl(\boldsymbol{S}^{-1}(\boldsymbol{y} - \boldsymbol{m}\,|\,\boldsymbol{\mu}_j,\,\boldsymbol{\Sigma}_j)\bigr),
where
\varphi(\cdot\,|\,\boldsymbol{\mu},\,\boldsymbol{\Sigma})
denotes a density
of the (multivariate) normal distribution
with mean \boldsymbol{\mu}
and a~variance-covariance matrix \boldsymbol{\Sigma}
.
Finally, \boldsymbol{S}
is a pre-specified diagonal scale matrix and
\boldsymbol{m}
is a pre-specified shift vector. Sometimes, by
setting \boldsymbol{m}
to sample means of components of
\boldsymbol{y}
and diagonal of \boldsymbol{S}
to
sample standard deviations of \boldsymbol{y}
(considerable)
improvement of the MCMC algorithm is achieved.
Value
An object of class NMixMCMC
or class NMixMCMClist
.
Object of class NMixMCMC
is returned if PED
is
FALSE
. Object of class NMixMCMClist
is returned if
PED
is TRUE
.
Object of class NMixMCMC
Objects of class NMixMCMC
have the following components:
- iter
index of the last iteration performed.
- nMCMC
used value of the argument
nMCMC
.- dim
dimension
p
of the distribution of data- nx_w
number of levels of a factor covariate on mixture weights (equal to 1 if there were no covariates on mixture weights)
- prior
a list containing the used value of the argument
prior
.- init
a list containing the used initial values for the MCMC (the first iteration of the burn-in).
- state.first
a list having the components labeled
y
,K
,w
,mu
,Li
,Q
,Sigma
,gammaInv
,r
containing the values of generic parameters at the first stored (after burn-in) iteration of the MCMC.- state.last
a list having the components labeled
y
,K
,w
,mu
,Li
,Q
,Sigma
,gammaInv
,r
containing the last sampled values of generic parameters.- RJMCMC
a list containing the used value of the argument
RJMCMC
.- scale
a list containing the used value of the argument
scale
.- freqK
frequency table of
K
based on the sampled chain.- propK
posterior distribution of
K
based on the sampled chain.- DIC
a
data.frame
having columns labeledDIC
,pD
,D.bar
,D.in.bar
containing values used to compute deviance information criterion (DIC). Currently onlyDIC_3
of Celeux et al. (2006) is implemented.- moves
a
data.frame
which summarizes the acceptance probabilities of different move types of the sampler.- K
numeric vector with a chain for
K
(number of mixture components).- w
numeric vector or matrix with a chain for
w
(mixture weights). It is a matrix withK
columns whenK
is fixed. Otherwise, it is a vector with weights put sequentially after each other.- mu
numeric vector or matrix with a chain for
\mu
(mixture means). It is a matrix withp\cdot K
columns whenK
is fixed. Otherwise, it is a vector with means put sequentially after each other.- Q
numeric vector or matrix with a chain for lower triangles of
\boldsymbol{Q}
(mixture inverse variances). It is a matrix with\frac{p(p+1)}{2}\cdot K
columns whenK
is fixed. Otherwise, it is a vector with lower triangles of\boldsymbol{Q}
matrices put sequentially after each other.- Sigma
numeric vector or matrix with a chain for lower triangles of
\Sigma
(mixture variances). It is a matrix with\frac{p(p+1)}{2}\cdot K
columns whenK
is fixed. Otherwise, it is a vector with lower triangles of\Sigma
matrices put sequentially after each other.- Li
numeric vector or matrix with a chain for lower triangles of Cholesky decompositions of
\boldsymbol{Q}
matrices. It is a matrix with\frac{p(p+1)}{2}\cdot K
columns whenK
is fixed. Otherwise, it is a vector with lower triangles put sequentially after each other.- gammaInv
matrix with
p
columns with a chain for inverses of the hyperparameter\boldsymbol{\gamma}
.- order
numeric vector or matrix with order indeces of mixture components related to artificial identifiability constraint defined by a suitable re-labeling algorithm (by default, simple ordering of the first component of the mixture means is used).
It is a matrix with
K
columns whenK
is fixed. Otherwise it is a vector with orders put sequentially after each other.- rank
numeric vector or matrix with rank indeces of mixture components. related to artificial identifiability constraint defined by a suitable re-labeling algorithm (by default, simple ordering of the first component of the mixture means is used).
It is a matrix with
K
columns whenK
is fixed. Otherwise it is a vector with ranks put sequentially after each other.- mixture
data.frame
with columns labeledy.Mean.*
,y.SD.*
,y.Corr.*.*
,z.Mean.*
,z.SD.*
,z.Corr.*.*
containing the chains for the means, standard deviations and correlations of the distribution of the original (y
) and scaled (z
) data based on a normal mixture at each iteration.- deviance
data.frame
with columns labelesLogL0
,LogL1
,dev.complete
,dev.observed
containing the chains of quantities needed to compute DIC.- pm.y
a
data.frame
withp
columns with posterior means for (latent) values of observed data (useful when there is censoring).- pm.z
a
data.frame
withp
columns with posterior means for (latent) values of scaled observed data (useful when there is censoring).- pm.indDev
a
data.frame
with columns labeledLogL0
,LogL1
,dev.complete
,dev.observed
,pred.dens
containing posterior means of individual contributions to the deviance.- pred.dens
a numeric vector with the predictive density of the data based on the MCMC sample evaluated at data points.
Note that when there is censoring, this is not exactly the predictive density as it is computed as the average of densities at each iteration evaluated at sampled values of latent observations at iterations.
- poster.comp.prob_u
a matrix which is present in the output object if the number of mixture components in the distribution of random effects is fixed and equal to
K
. In that case,poster.comp.prob_u
is a matrix withK
columns andn
rows with estimated posterior component probabilities – posterior means of the components of the underlying 0/1 allocation vector.WARNING: By default, the labels of components are based on artificial identifiability constraints based on ordering of the mixture means in the first margin. Very often, such identifiability constraint is not satisfactory!
- poster.comp.prob_b
a matrix which is present in the output object if the number of mixture components in the distribution of random effects is fixed and equal to
K
. In that case,poster.comp.prob_b
is a matrix withK
columns andn
rows with estimated posterior component probabilities – posterior mean over model parameters.WARNING: By default, the labels of components are based on artificial identifiability constraints based on ordering of the mixture means in the first margin. Very often, such identifiability constraint is not satisfactory!
- summ.y.Mean
Posterior summary statistics based on chains stored in
y.Mean.*
columns of thedata.frame
mixture
.- summ.y.SDCorr
Posterior summary statistics based on chains stored in
y.SD.*
andy.Corr.*.*
columns of thedata.frame
mixture
.- summ.z.Mean
Posterior summary statistics based on chains stored in
z.Mean.*
columns of thedata.frame
mixture
.- summ.z.SDCorr
Posterior summary statistics based on chains stored in
z.SD.*
andz.Corr.*.*
columns of thedata.frame
mixture
.- poster.mean.w
a numeric vector with posterior means of mixture weights after re-labeling. It is computed only if
K
is fixed and even then I am not convinced that these are useful posterior summary statistics (see label switching problem mentioned above). In any case, they should be used with care.- poster.mean.mu
a matrix with posterior means of mixture means after re-labeling. It is computed only if
K
is fixed and even then I am not convinced that these are useful posterior summary statistics (see label switching problem mentioned above). In any case, they should be used with care.- poster.mean.Q
a list with posterior means of mixture inverse variances after re-labeling. It is computed only if
K
is fixed and even then I am not convinced that these are useful posterior summary statistics (see label switching problem mentioned above). In any case, they should be used with care.- poster.mean.Sigma
a list with posterior means of mixture variances after re-labeling. It is computed only if
K
is fixed and even then I am not convinced that these are useful posterior summary statistics (see label switching problem mentioned above). In any case, they should be used with care.- poster.mean.Li
a list with posterior means of Cholesky decompositions of mixture inverse variances after re-labeling. It is computed only if
K
is fixed and even then I am not convinced that these are useful posterior summary statistics (see label switching problem mentioned above). In any case, they should be used with care.- relabel
a list which specifies the algorithm used to re-label the MCMC output to compute
order
,rank
,poster.comp.prob_u
,poster.comp.prob_b
,poster.mean.w
,poster.mean.mu
,poster.mean.Q
,poster.mean.Sigma
,poster.mean.Li
.- Cpar
a list with components useful to call underlying C++ functions (not interesting for ordinary users).
Object of class NMixMCMClist
Object of class NMixMCMClist
is the list having two components
of class NMixMCMC
representing two parallel chains and
additionally the following components:
- PED
values of penalized expected deviance and related quantities. It is a vector with five components:
D.expect
=
estimated expected deviance, where the estimate is based on two parallel chains;popt
=
estimated penalty, where the estimate is based on simple MCMC average based on two parallel chains;PED
=
estimated penalized expected deviance=
D.expect
+
popt
;wpopt
=
estimated penalty, where the estimate is based on weighted MCMC average (through importance sampling) based on two parallel chains;wPED
=
estimated penalized expected deviance=
D.expect
+
wpopt
.- popt
contributions to the unweighted penalty from each observation.
- wpopt
contributions to the weighted penalty from each observation.
- inv.D
for each observation, number of iterations (in both chains), where the deviance was in fact equal to infinity (when the corresponding density was lower than
dens.zero
) and was not taken into account when computingD.expect
.- inv.popt
for each observation, number of iterations, where the penalty was in fact equal to infinity and was not taken into account when computing
popt
.- inv.wpopt
for each observation, number of iterations, where the importance sampling weight was in fact equal to infinity and was not taken into account when computing
wpopt
.- sumISw
for each observation, sum of importance sampling weights.
Author(s)
Arnošt Komárek arnost.komarek@mff.cuni.cz
References
Celeux, G., Forbes, F., Robert, C. P., and Titterington, D. M. (2006). Deviance information criteria for missing data models. Bayesian Analysis, 1(4), 651–674.
Cappé, Robert and Rydén (2003). Reversible jump, birth-and-death and more general continuous time Markov chain Monte Carlo samplers. Journal of the Royal Statistical Society, Series B, 65(3), 679–700.
Diebolt, J. and Robert, C. P. (1994). Estimation of finite mixture distributions through Bayesian sampling. Journal of the Royal Statistical Society, Series B, 56(2), 363–375.
Jasra, A., Holmes, C. C., and Stephens, D. A. (2005). Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modelling. Statistical Science, 20(1), 50–67.
Komárek, A. (2009). A new R package for Bayesian estimation of multivariate normal mixtures allowing for selection of the number of components and interval-censored data. Computational Statistics and Data Analysis, 53(12), 3932–3947.
Plummer, M. (2008). Penalized loss functions for Bayesian model comparison. Biostatistics, 9(3), 523–539.
Richardson, S. and Green, P. J. (1997). On Bayesian analysis of mixtures with unknown number of components (with Discussion). Journal of the Royal Statistical Society, Series B, 59(4), 731–792.
Spiegelhalter, D. J.,Best, N. G., Carlin, B. P., and van der Linde, A. (2002). Bayesian measures of model complexity and fit (with Discussion). Journal of the Royal Statistical Society, Series B, 64(4), 583–639.
See Also
NMixPredDensMarg
, NMixPredDensJoint2
.
Examples
## Not run:
## See also additional material available in
## YOUR_R_DIR/library/mixAK/doc/
## or YOUR_R_DIR/site-library/mixAK/doc/
## - files Galaxy.R, Faithful.R, Tandmob.R and
## https://www2.karlin.mff.cuni.cz/~komarek/software/mixAK/Galaxy.pdf
## https://www2.karlin.mff.cuni.cz/~komarek/software/mixAK/Faithful.pdf
## https://www2.karlin.mff.cuni.cz/~komarek/software/mixAK/Tandmob.pdf
##
## ==============================================
## Simple analysis of Anderson's iris data
## ==============================================
library("colorspace")
data(iris, package="datasets")
summary(iris)
VARS <- names(iris)[1:4]
#COLS <- rainbow_hcl(3, start = 60, end = 240)
COLS <- c("red", "darkblue", "darkgreen")
names(COLS) <- levels(iris[, "Species"])
### Prior distribution and the length of MCMC
Prior <- list(priorK = "fixed", Kmax = 3)
nMCMC <- c(burn=5000, keep=10000, thin=5, info=1000)
### Run MCMC
set.seed(20091230)
fit <- NMixMCMC(y0 = iris[, VARS], prior = Prior, nMCMC = nMCMC)
### Basic posterior summary
print(fit)
### Univariate marginal posterior predictive densities
### based on chain #1
pdens1 <- NMixPredDensMarg(fit[[1]], lgrid=150)
plot(pdens1)
plot(pdens1, main=VARS, xlab=VARS)
### Bivariate (for each pair of margins) predictive densities
### based on chain #1
pdens2a <- NMixPredDensJoint2(fit[[1]])
plot(pdens2a)
plot(pdens2a, xylab=VARS)
plot(pdens2a, xylab=VARS, contour=TRUE)
### Determine the grid to compute bivariate densities
grid <- list(Sepal.Length=seq(3.5, 8.5, length=75),
Sepal.Width=seq(1.8, 4.5, length=75),
Petal.Length=seq(0, 7, length=75),
Petal.Width=seq(-0.2, 3, length=75))
pdens2b <- NMixPredDensJoint2(fit[[1]], grid=grid)
plot(pdens2b, xylab=VARS)
### Plot with contours
ICOL <- rev(heat_hcl(20, c=c(80, 30), l=c(30, 90), power=c(1/5, 2)))
oldPar <- par(mfrow=c(2, 3), bty="n")
for (i in 1:3){
for (j in (i+1):4){
NAME <- paste(i, "-", j, sep="")
MAIN <- paste(VARS[i], "x", VARS[j])
image(pdens2b$x[[i]], pdens2b$x[[j]], pdens2b$dens[[NAME]], col=ICOL,
xlab=VARS[i], ylab=VARS[j], main=MAIN)
contour(pdens2b$x[[i]], pdens2b$x[[j]], pdens2b$dens[[NAME]], add=TRUE, col="brown4")
}
}
### Plot with data
for (i in 1:3){
for (j in (i+1):4){
NAME <- paste(i, "-", j, sep="")
MAIN <- paste(VARS[i], "x", VARS[j])
image(pdens2b$x[[i]], pdens2b$x[[j]], pdens2b$dens[[NAME]], col=ICOL,
xlab=VARS[i], ylab=VARS[j], main=MAIN)
for (spec in levels(iris[, "Species"])){
Data <- subset(iris, Species==spec)
points(Data[,i], Data[,j], pch=16, col=COLS[spec])
}
}
}
### Set the graphical parameters back to their original values
par(oldPar)
### Clustering based on posterior summary statistics of component allocations
### or on the posterior distribution of component allocations
### (these are two equivalent estimators of probabilities of belonging
### to each mixture components for each observation)
p1 <- fit[[1]]$poster.comp.prob_u
p2 <- fit[[1]]$poster.comp.prob_b
### Clustering based on posterior summary statistics of mixture weight, means, variances
p3 <- NMixPlugDA(fit[[1]], iris[, VARS])
p3 <- p3[, paste("prob", 1:3, sep="")]
### Observations from "setosa" species (all would be allocated in component 1)
apply(p1[1:50,], 2, quantile, prob=seq(0, 1, by=0.1))
apply(p2[1:50,], 2, quantile, prob=seq(0, 1, by=0.1))
apply(p3[1:50,], 2, quantile, prob=seq(0, 1, by=0.1))
### Observations from "versicolor" species (almost all would be allocated in component 2)
apply(p1[51:100,], 2, quantile, prob=seq(0, 1, by=0.1))
apply(p2[51:100,], 2, quantile, prob=seq(0, 1, by=0.1))
apply(p3[51:100,], 2, quantile, prob=seq(0, 1, by=0.1))
### Observations from "virginica" species (all would be allocated in component 3)
apply(p1[101:150,], 2, quantile, prob=seq(0, 1, by=0.1))
apply(p2[101:150,], 2, quantile, prob=seq(0, 1, by=0.1))
apply(p3[101:150,], 2, quantile, prob=seq(0, 1, by=0.1))
## End(Not run)