comix {COMIX} | R Documentation |
This function generates a sample from the posterior of COMIX.
Description
This function generates a sample from the posterior of COMIX.
Usage
comix(Y, C, prior = NULL, pmc = NULL, state = NULL, ncores = 2)
Arguments
Y |
Matrix of the data. Each row represents an observation. |
C |
Vector of the group label of each observation. Labels must be integers starting from 1. |
prior |
A list giving the prior information. If unspecified, a default prior is used. The list includes the following parameters:
|
pmc |
A list giving the Population Monte Carlo (PMC) parameters:
|
state |
A list giving the initial cluster labels:
|
ncores |
The number of CPU cores to utilize in parallel. Defaults to 2. |
Value
An object of class COMIX, a list of 4:
chain
, a named list:
-
t
: annsave
nrow(Y)
matrix with estimated cluster labels for each saved step of the chain and each observation in the dataY
. -
z
: ansave
nrow(Y)
matrix with estimated values of the latentvariable for the parameterization of the multivariate skew normal distribution used in the sampler for each saved step of the chain and each observation in the data
Y
. -
W
: anlength(unique(C))
K
-
nsave
: array storing the estimated sample- and cluster-specific weights for each saved step of the chain. -
xi
: anlength(unique(C))
(ncol(Y) x K)
nsave
array storing the estimated sample- and cluster-specific multivariate skew normal location parameters of the kernel for each saved step of the chain. -
xi0
: anncol(Y)
K
-
nsave
: array storing the estimated cluster-specific group location parameters for each saved step of the chain. -
psi
: anncol(Y)
K
nsave
array storing the estimated cluster-specific skew parameters of the kernels in the parameterization of the multivariate skew normal distribution used in the sampler for each saved step of the chain. -
G
: anncol(Y)
(ncol(Y) x K)
nsave
array storing the estimated cluster-specific multivariate skew normal scale matrix (in row format) of the kernel used in the sampler for each saved step of the chain. -
E
: anncol(Y)
(ncol(Y) x K)
nsave
array storing the estimated covariance matrix (in row format) of the multivariate normal distribution from which the sample- and cluster-specific location parameters are drawn for each saved step of the chain. -
eta
: ansave
1
matrix storing the estimated Dirichlet Process Mixture concentration parameter for each saved step of the chain. -
Sigma
: anncol(Y)
(ncol(Y) x K)
nsave
array storing the estimated cluster-specific multivariate skew normal scale matrix (in row format) of the kernel for each saved step of the chain. -
alpha
: anncol(Y)
K
nsave
array storing the estimated cluster-specific skew parameters of the kernel's multivariate skew normal distribution for each saved step of the chain.
-
data
, a named list that includes the matrix of the dataY
andC
the vector of the group label of each observation. -
prior
andpmc
, the lists, as above, that were provided as inputs to the function.
Examples
library(COMIX)
# Number of observations for each sample (row) and cluster (column):
njk <-
matrix(
c(
150, 300,
250, 200
),
nrow = 2,
byrow = TRUE
)
# Dimension of data:
p <- 3
# Scale and skew parameters for first cluster:
Sigma1 <- matrix(0.5, nrow = p, ncol = p) + diag(0.5, nrow = p)
alpha1 <- rep(0, p)
alpha1[1] <- -5
# location parameter for first cluster in first sample:
xi11 <- rep(0, p)
# location parameter for first cluster in second sample (aligned with first):
xi21 <- rep(0, p)
# Scale and skew parameters for second cluster:
Sigma2 <- matrix(-1/3, nrow = p, ncol = p) + diag(1 + 1/3, nrow = p)
alpha2 <- rep(0, p)
alpha2[2] <- 5
# location parameter for second cluster in first sample:
xi12 <- rep(3, p)
# location parameter for second cluster in second sample (misaligned with first):
xi22 <- rep(4, p)
# Sample data:
set.seed(1)
Y <-
rbind(
sn::rmsn(njk[1, 1], xi = xi11, Omega = Sigma1, alpha = alpha1),
sn::rmsn(njk[1, 2], xi = xi12, Omega = Sigma2, alpha = alpha2),
sn::rmsn(njk[2, 1], xi = xi21, Omega = Sigma1, alpha = alpha1),
sn::rmsn(njk[2, 2], xi = xi22, Omega = Sigma2, alpha = alpha2)
)
C <- c(rep(1, rowSums(njk)[1]), rep(2, rowSums(njk)[2]))
prior <- list(zeta = 1, K = 10)
pmc <- list(naprt = 5, nburn = 200, nsave = 200) # Reasonable usage
pmc <- list(naprt = 5, nburn = 2, nsave = 5) # Minimal usage for documentation
# Fit the model:
res <- comix(Y, C, pmc = pmc, prior = prior)
# Relabel to resolve potential label switching issues:
res_relab <- relabelChain(res)
# Generate calibrated data:
cal <- calibrateNoDist(res_relab)
# Compare raw and calibrated data: (see plot in vignette)
# par(mfrow=c(1, 2))
# plot(Y, col = C, xlim = range(Y[,1]), ylim = range(Y[,2]) )
# Get posterior estimates for the model parameters:
res_summary <- summarizeChain(res_relab)
# Check for instance, the cluster assignment labels:
table(res_summary$t)
# Indeed the same as
colSums(njk)
# Or examine the skewness parameter for the non-trivial clusters:
res_summary$alpha[ , unique(res_summary$t)]
# And compare those to
cbind(alpha1, alpha2)
# (see vignette for a more detailed example)