R: This function generates a sample from the posterior of COMIX.

comix {COMIX}

R Documentation

This function generates a sample from the posterior of COMIX.

Description

This function generates a sample from the posterior of COMIX.

Usage

comix(Y, C, prior = NULL, pmc = NULL, state = NULL, ncores = 2)

Arguments

`Y`	Matrix of the data. Each row represents an observation.
`C`	Vector of the group label of each observation. Labels must be integers starting from 1.
`prior`	A list giving the prior information. If unspecified, a default prior is used. The list includes the following parameters: `zeta`: Coarsening parameter. A number between 0 and 1. `zeta` = 1: sample from standard posterior; `zeta` < 1: sample from power posterior. The lower `zeta` is, the more flexible the kernels become. `K`: Maximal number of mixture components. `eta_prior` Parameters for gamma prior for concentration parameter of the stick breaking process prior for the weights. `m0`: Number of degrees of freedom for the inverse Wishart prior for Sigma, the covariance matrix of the kernels. `Lambda`: Mean parameter for the inverse Wishart prior for Sigma, the covariance matrix of the kernels. `b0`: Mean parameter for the multivariate normal distribution that is the prior for the group mean parameter xi0. `B0`: Covariance parameter for the multivariate normal distribution that is the prior for the group mean parameter xi0. `e0`: Number of degrees of freedom for the inverse Wishart prior for `E_k`, the covariance matrix of the multivariate normal from which `\xi_{j,k}` are drawn. `E0`: Mean parameter for the inverse Wishart prior for `E_k`, the covariance matrix of the multivariate normal from which `\xi_{j,k}` are drawn. `merge_step`: Introduce step to merge mixture components with small KL divergence. Default is `merge_step = TRUE`. `merge_par`: Parameter controlling merging radius. Default is `merge_par = 0.1`.
`pmc`	A list giving the Population Monte Carlo (PMC) parameters: `npart`: Number of PMC particles. `nburn`: Number of burn-in steps `nsave`: Number of steps in the chain after burn-in. `nskip`: Thinning parameter, number of steps to skip between saving steps after burn-in. `ndisplay`: Display status of chain after every `ndisplay` steps.
`state`	A list giving the initial cluster labels: `t`: An integer vector, same length as the number of rows of `Y`, with cluster labels between `1` and `K`.
`ncores`	The number of CPU cores to utilize in parallel. Defaults to 2.

Value

An object of class COMIX, a list of 4:

chain, a named list:

t: an nsave \times nrow(Y) matrix with estimated cluster labels for each saved step of the chain and each observation in the data Y.
z: a nsave \times nrow(Y) matrix with estimated values of the latent z_{i,j} variable for the parameterization of the multivariate skew normal distribution used in the sampler for each saved step of the chain and each observation in the data Y.
W: an length(unique(C)) \times K \times
nsave: array storing the estimated sample- and cluster-specific weights for each saved step of the chain.
xi: an length(unique(C)) \times (ncol(Y) x K) \times nsave array storing the estimated sample- and cluster-specific multivariate skew normal location parameters of the kernel for each saved step of the chain.
xi0: an ncol(Y) \times K \times
nsave: array storing the estimated cluster-specific group location parameters for each saved step of the chain.
psi: an ncol(Y) \times K \times nsave array storing the estimated cluster-specific skew parameters of the kernels in the parameterization of the multivariate skew normal distribution used in the sampler for each saved step of the chain.
G: an ncol(Y) \times (ncol(Y) x K) \times nsave array storing the estimated cluster-specific multivariate skew normal scale matrix (in row format) of the kernel used in the sampler for each saved step of the chain.
E: an ncol(Y) \times (ncol(Y) x K) \times nsave array storing the estimated covariance matrix (in row format) of the multivariate normal distribution from which the sample- and cluster-specific location parameters are drawn for each saved step of the chain.
eta: a nsave \times 1 matrix storing the estimated Dirichlet Process Mixture concentration parameter for each saved step of the chain.
Sigma: an ncol(Y) \times (ncol(Y) x K) \times nsave array storing the estimated cluster-specific multivariate skew normal scale matrix (in row format) of the kernel for each saved step of the chain.
alpha: an ncol(Y) \times K \times nsave array storing the estimated cluster-specific skew parameters of the kernel's multivariate skew normal distribution for each saved step of the chain.

data, a named list that includes the matrix of the data Y and C the vector of the group label of each observation.
prior and pmc, the lists, as above, that were provided as inputs to the function.

Examples

library(COMIX)
# Number of observations for each sample (row) and cluster (column):
njk <- 
  matrix(
    c(
      150, 300,
      250, 200
    ),
    nrow = 2,
    byrow = TRUE
  )


# Dimension of data:
p <- 3

# Scale and skew parameters for first cluster:
Sigma1 <- matrix(0.5, nrow = p, ncol = p) + diag(0.5, nrow = p)
alpha1 <- rep(0, p)
alpha1[1] <- -5
# location parameter for first cluster in first sample:
xi11 <- rep(0, p)
# location parameter for first cluster in second sample (aligned with first):
xi21 <- rep(0, p)

# Scale and skew parameters for second cluster:
Sigma2 <- matrix(-1/3, nrow = p, ncol = p) + diag(1 + 1/3, nrow = p)
alpha2 <- rep(0, p)
alpha2[2] <- 5
# location parameter for second cluster in first sample:
xi12 <- rep(3, p)
# location parameter for second cluster in second sample (misaligned with first):
xi22 <- rep(4, p)

# Sample data:
set.seed(1)
Y <- 
  rbind(
    sn::rmsn(njk[1, 1], xi = xi11, Omega = Sigma1, alpha = alpha1),
    sn::rmsn(njk[1, 2], xi = xi12, Omega = Sigma2, alpha = alpha2),
    sn::rmsn(njk[2, 1], xi = xi21, Omega = Sigma1, alpha = alpha1),
    sn::rmsn(njk[2, 2], xi = xi22, Omega = Sigma2, alpha = alpha2)
  )

C <- c(rep(1, rowSums(njk)[1]), rep(2, rowSums(njk)[2]))

prior <- list(zeta = 1, K = 10)
pmc <- list(naprt = 5, nburn = 200, nsave = 200) # Reasonable usage
pmc <- list(naprt = 5, nburn = 2, nsave = 5) # Minimal usage for documentation
# Fit the model:
res <- comix(Y, C, pmc = pmc, prior = prior)

# Relabel to resolve potential label switching issues:
res_relab <- relabelChain(res)

# Generate calibrated data:
cal <- calibrateNoDist(res_relab)

# Compare raw and calibrated data: (see plot in vignette)
# par(mfrow=c(1, 2))
# plot(Y, col = C, xlim = range(Y[,1]), ylim = range(Y[,2]) )

# Get posterior estimates for the model parameters:
res_summary <- summarizeChain(res_relab)
# Check for instance, the cluster assignment labels:
table(res_summary$t)
# Indeed the same as 
colSums(njk)

# Or examine the skewness parameter for the non-trivial clusters:
res_summary$alpha[ , unique(res_summary$t)]
# And compare those to
cbind(alpha1, alpha2)

# (see vignette for a more detailed example)

[Package COMIX version 1.0.0 Index]