comix {COMIX}R Documentation

This function generates a sample from the posterior of COMIX.

Description

This function generates a sample from the posterior of COMIX.

Usage

comix(Y, C, prior = NULL, pmc = NULL, state = NULL, ncores = 2)

Arguments

Y

Matrix of the data. Each row represents an observation.

C

Vector of the group label of each observation. Labels must be integers starting from 1.

prior

A list giving the prior information. If unspecified, a default prior is used. The list includes the following parameters:

  • zeta: Coarsening parameter. A number between 0 and 1. zeta = 1: sample from standard posterior; zeta < 1: sample from power posterior. The lower zeta is, the more flexible the kernels become.

  • K: Maximal number of mixture components.

  • eta_prior Parameters for gamma prior for concentration parameter of the stick breaking process prior for the weights.

  • m0: Number of degrees of freedom for the inverse Wishart prior for Sigma, the covariance matrix of the kernels.

  • Lambda: Mean parameter for the inverse Wishart prior for Sigma, the covariance matrix of the kernels.

  • b0: Mean parameter for the multivariate normal distribution that is the prior for the group mean parameter xi0.

  • B0: Covariance parameter for the multivariate normal distribution that is the prior for the group mean parameter xi0.

  • e0: Number of degrees of freedom for the inverse Wishart prior for E_k, the covariance matrix of the multivariate normal from which \xi_{j,k} are drawn.

  • E0: Mean parameter for the inverse Wishart prior for E_k, the covariance matrix of the multivariate normal from which \xi_{j,k} are drawn.

  • merge_step: Introduce step to merge mixture components with small KL divergence. Default is merge_step = TRUE.

  • merge_par: Parameter controlling merging radius. Default is merge_par = 0.1.

pmc

A list giving the Population Monte Carlo (PMC) parameters:

  • npart: Number of PMC particles.

  • nburn: Number of burn-in steps

  • nsave: Number of steps in the chain after burn-in.

  • nskip: Thinning parameter, number of steps to skip between saving steps after burn-in.

  • ndisplay: Display status of chain after every ndisplay steps.

state

A list giving the initial cluster labels:

  • t: An integer vector, same length as the number of rows of Y, with cluster labels between 1 and K.

ncores

The number of CPU cores to utilize in parallel. Defaults to 2.

Value

An object of class COMIX, a list of 4:

chain, a named list:

Examples

library(COMIX)
# Number of observations for each sample (row) and cluster (column):
njk <- 
  matrix(
    c(
      150, 300,
      250, 200
    ),
    nrow = 2,
    byrow = TRUE
  )


# Dimension of data:
p <- 3

# Scale and skew parameters for first cluster:
Sigma1 <- matrix(0.5, nrow = p, ncol = p) + diag(0.5, nrow = p)
alpha1 <- rep(0, p)
alpha1[1] <- -5
# location parameter for first cluster in first sample:
xi11 <- rep(0, p)
# location parameter for first cluster in second sample (aligned with first):
xi21 <- rep(0, p)

# Scale and skew parameters for second cluster:
Sigma2 <- matrix(-1/3, nrow = p, ncol = p) + diag(1 + 1/3, nrow = p)
alpha2 <- rep(0, p)
alpha2[2] <- 5
# location parameter for second cluster in first sample:
xi12 <- rep(3, p)
# location parameter for second cluster in second sample (misaligned with first):
xi22 <- rep(4, p)

# Sample data:
set.seed(1)
Y <- 
  rbind(
    sn::rmsn(njk[1, 1], xi = xi11, Omega = Sigma1, alpha = alpha1),
    sn::rmsn(njk[1, 2], xi = xi12, Omega = Sigma2, alpha = alpha2),
    sn::rmsn(njk[2, 1], xi = xi21, Omega = Sigma1, alpha = alpha1),
    sn::rmsn(njk[2, 2], xi = xi22, Omega = Sigma2, alpha = alpha2)
  )

C <- c(rep(1, rowSums(njk)[1]), rep(2, rowSums(njk)[2]))

prior <- list(zeta = 1, K = 10)
pmc <- list(naprt = 5, nburn = 200, nsave = 200) # Reasonable usage
pmc <- list(naprt = 5, nburn = 2, nsave = 5) # Minimal usage for documentation
# Fit the model:
res <- comix(Y, C, pmc = pmc, prior = prior)

# Relabel to resolve potential label switching issues:
res_relab <- relabelChain(res)

# Generate calibrated data:
cal <- calibrateNoDist(res_relab)

# Compare raw and calibrated data: (see plot in vignette)
# par(mfrow=c(1, 2))
# plot(Y, col = C, xlim = range(Y[,1]), ylim = range(Y[,2]) )

# Get posterior estimates for the model parameters:
res_summary <- summarizeChain(res_relab)
# Check for instance, the cluster assignment labels:
table(res_summary$t)
# Indeed the same as 
colSums(njk)

# Or examine the skewness parameter for the non-trivial clusters:
res_summary$alpha[ , unique(res_summary$t)]
# And compare those to
cbind(alpha1, alpha2)

# (see vignette for a more detailed example)

[Package COMIX version 1.0.0 Index]