comix {COMIX} | R Documentation |
This function generates a sample from the posterior of COMIX.
Description
This function generates a sample from the posterior of COMIX.
Usage
comix(Y, C, prior = NULL, pmc = NULL, state = NULL, ncores = 2)
Arguments
Y |
Matrix of the data. Each row represents an observation. |
C |
Vector of the group label of each observation. Labels must be integers starting from 1. |
prior |
A list giving the prior information. If unspecified, a default prior is used. The list includes the following parameters:
|
pmc |
A list giving the Population Monte Carlo (PMC) parameters:
|
state |
A list giving the initial cluster labels:
|
ncores |
The number of CPU cores to utilize in parallel. Defaults to 2. |
Value
An object of class COMIX, a list of 4:
chain
, a named list:
-
t
: annsave
\times
nrow(Y)
matrix with estimated cluster labels for each saved step of the chain and each observation in the dataY
. -
z
: ansave
\times
nrow(Y)
matrix with estimated values of the latentz_{i,j}
variable for the parameterization of the multivariate skew normal distribution used in the sampler for each saved step of the chain and each observation in the dataY
. -
W
: anlength(unique(C))
\times
K
\times
-
nsave
: array storing the estimated sample- and cluster-specific weights for each saved step of the chain. -
xi
: anlength(unique(C))
\times
(ncol(Y) x K)
\times
nsave
array storing the estimated sample- and cluster-specific multivariate skew normal location parameters of the kernel for each saved step of the chain. -
xi0
: anncol(Y)
\times
K
\times
-
nsave
: array storing the estimated cluster-specific group location parameters for each saved step of the chain. -
psi
: anncol(Y)
\times
K
\times
nsave
array storing the estimated cluster-specific skew parameters of the kernels in the parameterization of the multivariate skew normal distribution used in the sampler for each saved step of the chain. -
G
: anncol(Y)
\times
(ncol(Y) x K)
\times
nsave
array storing the estimated cluster-specific multivariate skew normal scale matrix (in row format) of the kernel used in the sampler for each saved step of the chain. -
E
: anncol(Y)
\times
(ncol(Y) x K)
\times
nsave
array storing the estimated covariance matrix (in row format) of the multivariate normal distribution from which the sample- and cluster-specific location parameters are drawn for each saved step of the chain. -
eta
: ansave
\times
1
matrix storing the estimated Dirichlet Process Mixture concentration parameter for each saved step of the chain. -
Sigma
: anncol(Y)
\times
(ncol(Y) x K)
\times
nsave
array storing the estimated cluster-specific multivariate skew normal scale matrix (in row format) of the kernel for each saved step of the chain. -
alpha
: anncol(Y)
\times
K
\times
nsave
array storing the estimated cluster-specific skew parameters of the kernel's multivariate skew normal distribution for each saved step of the chain.
-
data
, a named list that includes the matrix of the dataY
andC
the vector of the group label of each observation. -
prior
andpmc
, the lists, as above, that were provided as inputs to the function.
Examples
library(COMIX)
# Number of observations for each sample (row) and cluster (column):
njk <-
matrix(
c(
150, 300,
250, 200
),
nrow = 2,
byrow = TRUE
)
# Dimension of data:
p <- 3
# Scale and skew parameters for first cluster:
Sigma1 <- matrix(0.5, nrow = p, ncol = p) + diag(0.5, nrow = p)
alpha1 <- rep(0, p)
alpha1[1] <- -5
# location parameter for first cluster in first sample:
xi11 <- rep(0, p)
# location parameter for first cluster in second sample (aligned with first):
xi21 <- rep(0, p)
# Scale and skew parameters for second cluster:
Sigma2 <- matrix(-1/3, nrow = p, ncol = p) + diag(1 + 1/3, nrow = p)
alpha2 <- rep(0, p)
alpha2[2] <- 5
# location parameter for second cluster in first sample:
xi12 <- rep(3, p)
# location parameter for second cluster in second sample (misaligned with first):
xi22 <- rep(4, p)
# Sample data:
set.seed(1)
Y <-
rbind(
sn::rmsn(njk[1, 1], xi = xi11, Omega = Sigma1, alpha = alpha1),
sn::rmsn(njk[1, 2], xi = xi12, Omega = Sigma2, alpha = alpha2),
sn::rmsn(njk[2, 1], xi = xi21, Omega = Sigma1, alpha = alpha1),
sn::rmsn(njk[2, 2], xi = xi22, Omega = Sigma2, alpha = alpha2)
)
C <- c(rep(1, rowSums(njk)[1]), rep(2, rowSums(njk)[2]))
prior <- list(zeta = 1, K = 10)
pmc <- list(naprt = 5, nburn = 200, nsave = 200) # Reasonable usage
pmc <- list(naprt = 5, nburn = 2, nsave = 5) # Minimal usage for documentation
# Fit the model:
res <- comix(Y, C, pmc = pmc, prior = prior)
# Relabel to resolve potential label switching issues:
res_relab <- relabelChain(res)
# Generate calibrated data:
cal <- calibrateNoDist(res_relab)
# Compare raw and calibrated data: (see plot in vignette)
# par(mfrow=c(1, 2))
# plot(Y, col = C, xlim = range(Y[,1]), ylim = range(Y[,2]) )
# Get posterior estimates for the model parameters:
res_summary <- summarizeChain(res_relab)
# Check for instance, the cluster assignment labels:
table(res_summary$t)
# Indeed the same as
colSums(njk)
# Or examine the skewness parameter for the non-trivial clusters:
res_summary$alpha[ , unique(res_summary$t)]
# And compare those to
cbind(alpha1, alpha2)
# (see vignette for a more detailed example)