restrictedGibbsMergeSplit {sams}R Documentation

Merge-Split Sampling for a Partition Based on Restricted Gibbs Scans

Description

Merge-split proposals for conjugate "Chinese Restaurant Process" (CRP) mixture models using restricted Gibbs scans from a uniformly random launch state, as presented in Jain & Neal (2004), with additional functionality for the two parameter CRP prior.

Usage

restrictedGibbsMergeSplit(
  partition,
  logPosteriorPredictiveDensity = function(i, subset) 0,
  t = 1,
  mass = 1,
  discount = 0,
  nUpdates = 1L,
  selectionWeights = NULL
)

Arguments

partition

A numeric vector of cluster labels representing the current partition.

logPosteriorPredictiveDensity

A function taking an index i (as a numeric vector of length one) and a subset of integers subset, and returning the natural logarithm of p( y_i | y_subset ), i.e., that item's contribution to the log integrated likelihood given a subset of the other items. The default value "turns off" the likelihood, resulting in prior simulation (rather than posterior simulation).

t

A non-negative integer indicating the number of restricted Gibbs scans to perform for each merge/split proposal.

mass

A specification of the mass (concentration) parameter in the CRP prior. Must be greater than the -discount argument.

discount

A numeric value on the interval [0,1) corresponding to the discount parameter in the two parameter CRP prior.

nUpdates

An integer giving the number of merge-split proposals before returning. This has the effect of thinning the Markov chain.

selectionWeights

A matrix or data frame whose first two columns are the unique pairs of data indices, along with a column of weights representing how likely each pair is to be selected at the beginning of each merge-split update.

Value

partition

An integer vector giving the updated partition encoded using cluster labels.

accept

The acceptance rate of the Metropolis-Hastings proposals, i.e. the number accepted proposals divided by nUpdates.

References

Jain, S., & Neal, R. M. (2004). A split-merge Markov chain Monte Carlo procedure for the Dirichlet process mixture model. Journal of computational and Graphical Statistics, 13(1), 158-182.

Examples


# Neal (2000) model and data
nealData <- c(-1.48, -1.40, -1.16, -1.08, -1.02, 0.14, 0.51, 0.53, 0.78)
mkLogPosteriorPredictiveDensity <- function(data = nealData,
                                            sigma2 = 0.1^2,
                                            mu0 = 0,
                                            sigma02 = 1) {
  function(i, subset) {
    posteriorVariance <- 1 / ( 1/sigma02 + length(subset)/sigma2 )
    posteriorMean <- posteriorVariance * ( mu0/sigma02 + sum(data[subset])/sigma2 )
    posteriorPredictiveSD <- sqrt(posteriorVariance + sigma2)
    dnorm(data[i], posteriorMean, posteriorPredictiveSD, log = TRUE)
  }
}

logPostPredict <- mkLogPosteriorPredictiveDensity()

nSamples <- 1000L
partitions <- matrix(0, nrow = nSamples, ncol = length(nealData))
accept <- 0
for ( i in 2:nSamples ) {
  ms <- restrictedGibbsMergeSplit(partitions[i-1,],
                                  logPostPredict,
                                  t = 1,
                                  mass = 1.0,
                                  nUpdates = 2)
  partitions[i,] <- ms$partition
  accept <- accept + ms$accept
}

accept / nSamples # M-H acceptance rate

# convergence and mixing diagnostics
nSubsets <- apply(partitions, 1, function(x) length(unique(x)))
mean(nSubsets)
sum(acf(nSubsets)$acf)-1   # Autocorrelation time

entropy <- apply(partitions, 1, partitionEntropy)
plot.ts(entropy)


[Package sams version 0.4.3 Index]