dppmix_mvnorm {dppmix} | R Documentation |
Fit a determinantal point process multivariate normal mixture model.
Description
Discover clusters in multidimensional data using a multivariate normal mixture model with a determinantal point process prior.
Usage
dppmix_mvnorm(
X,
hparams = NULL,
store = NULL,
control = NULL,
fixed = NULL,
verbose = TRUE
)
Arguments
X |
|
hparams |
a list of hyperparameter values:
|
store |
a vector of character strings specifying additional vars of
interest; a value of |
control |
a list of control parameters:
|
fixed |
a list of fixed parameter values |
verbose |
whether to emit verbose message |
Details
A determinantal point process (DPP) prior is a repulsive prior. Compare to mixture models using independent priors, a DPP mixutre model will often discover a parsimonious set of mixture components (clusters).
Model fitting is done by sampling parameters from the posterior distribution using a reversible jump Markov chain Monte Carlo sampling approach.
Given X = [x_i]
, where each x_i
is a D-dimensional real vector,
we seek the posterior distribution the latent variable z = [z_i]
, where
each z_i
is an integer representing cluster membership.
x_i \mid z_i \sim Normal(\mu_k, \Sigma_k)
z_i \sim Categorical(w)
w \sim Dirichlet([\delta ... \delta])
\mu_k \sim DPP(C)
where C
is the covariance function that evaluates the distances among the
data points:
C(x_1, x_2) = exp( - \sum_d \frac{ (x_1 - x_2)^2 }{ \theta^2 } )
We also define \Sigma_k = E_k \Lambda_k E_k^\top
, where E_k
is an
orthonormal matrix whose column represents eigenvectors.
We further assume that E_k = E
is fixed across all cluster components
so that E
can be estimated as the eigenvectors of the covariance matrix of
the data matrix X
. Finally, we put a prior on the entries of the
\Lambda_k
diagonal matrix:
\lambda_{kd}^{-1} \sim Gamma( a_0, b_0 )
Hence, the hyperameters of the model include:
delta, a0, b0, theta
, as well as sampling hyperparameter
sigma_pro_mu
, which controls the spread of the Gaussian
proposal distribution for the random-walk Metropolis-Hastings update of
the \mu
parameter.
The parameters (and their dimensions) in the model include:
K
, z (N x 1)
, w (K x 1)
, lambda (K x J)
,
mu (K x J)
, Sigma (J x J x K)
.
If any parameter is fixed, then K
must be fixed as well.
Value
a dppmix_mcmc
object containing posterior samples of
the parameters
References
Yanxun Xu, Peter Mueller, Donatello Telesca. Bayesian Inference for Latent Biologic Structure with Determinantal Point Processes. Biometrics. 2016;72(3):955-64.
Examples
set.seed(1)
ns <- c(3, 3)
means <- list(c(-6, -3), c(0, 4))
d <- rmvnorm_clusters(ns, means)
mcmc <- dppmix_mvnorm(d$X, verbose=FALSE)
res <- estimate(mcmc)
table(d$cl, res$z)