R: Cluster Observations Based on Indicator MCMC Draws

clusterMix {bayesm}

R Documentation

Cluster Observations Based on Indicator MCMC Draws

Description

clusterMix uses MCMC draws of indicator variables from a normal component mixture model to cluster observations based on a similarity matrix.

Usage

clusterMix(zdraw, cutoff=0.9, SILENT=FALSE, nprint=BayesmConstant.nprint)

Arguments

`zdraw`	`R x nobs` array of draws of indicators
`cutoff`	cutoff probability for similarity (def: `0.9`)
`SILENT`	logical flag for silent operation (def: `FALSE`)
`nprint`	print every nprint'th draw (def: `100`)

Details

Define a similarity matrix, Sim with Sim[i,j]=1 if observations i and j are in same component. Compute the posterior mean of Sim over indicator draws.

Clustering is achieved by two means:

Method A: Find the indicator draw whose similarity matrix minimizes loss(E[Sim]-Sim(z)), where loss is absolute deviation.

Method B: Define a Similarity matrix by setting any element of E[Sim] = 1 if E[Sim] > cutoff. Compute the clustering scheme associated with this "windsorized" Similarity matrix.

Value

A list containing:

`clustera:`	indicator function for clustering based on method A above
`clusterb:`	indicator function for clustering based on method B above

Warning

This routine is a utility routine that does not check the input arguments for proper dimensions and type.

Author(s)

Peter Rossi, Anderson School, UCLA, perossichi@gmail.com.

References

For further discussion, see Bayesian Statistics and Marketing by Rossi, Allenby, and McCulloch Chapter 3.

Examples

if(nchar(Sys.getenv("LONG_TEST")) != 0) {

  ## simulate data from mixture of normals
  n = 500
  pvec = c(.5,.5)
  mu1 = c(2,2)
  mu2 = c(-2,-2)
  Sigma1 = matrix(c(1,0.5,0.5,1), ncol=2)
  Sigma2 = matrix(c(1,0.5,0.5,1), ncol=2)
  comps = NULL
  comps[[1]] = list(mu1, backsolve(chol(Sigma1),diag(2)))
  comps[[2]] = list(mu2, backsolve(chol(Sigma2),diag(2)))
  dm = rmixture(n, pvec, comps)
  
  ## run MCMC on normal mixture
  Data = list(y=dm$x)
  ncomp = 2
  Prior = list(ncomp=ncomp, a=c(rep(100,ncomp)))
  R = 2000
  Mcmc = list(R=R, keep=1)
  out = rnmixGibbs(Data=Data, Prior=Prior, Mcmc=Mcmc)
  
  ## find clusters
  begin = 500
  end = R
  outclusterMix = clusterMix(out$nmix$zdraw[begin:end,])
  
  
  ## check on clustering versus "truth"
  ## note: there could be switched labels
  table(outclusterMix$clustera, dm$z)
  table(outclusterMix$clusterb, dm$z)
}

[Package bayesm version 3.1-6 Index]