R: Merge-split Sampling for a Partition Based on Sequential...

seqAllocatedMergeSplit {sams}

R Documentation

Merge-split Sampling for a Partition Based on Sequential Allocation of Items

Description

Merge-split proposals for conjugate "Chinese Restaurant Process" (CRP) mixture models using sequential allocation of items, as originally described in Dahl (2003), with additional functionality for the two parameter CRP prior, as well as complementing these allocations with restricted Gibbs scans such as those discussed in Jain & Neal (2004).

Usage

seqAllocatedMergeSplit(
  partition,
  logPosteriorPredictiveDensity = function(i, subset) 0,
  t = 1,
  mass = 1,
  discount = 0,
  nUpdates = 1L,
  selectionWeights = NULL
)

Arguments

`partition`	A numeric vector of cluster labels representing the current partition.
`logPosteriorPredictiveDensity`	A function taking an index `i` (as a numeric vector of length one) and a subset of integers `subset`, and returning the natural logarithm of `p( y_i \| y_subset )`, i.e., that item's contribution to the log integrated likelihood given a subset of the other items. The default value "turns off" the likelihood, resulting in prior simulation (rather than posterior simulation).
`t`	A non-negative integer indicating the number of restricted Gibbs scans to perform for each merge/split proposal.
`mass`	A specification of the mass (concentration) parameter in the CRP prior. Must be greater than the `-discount` argument.
`discount`	A numeric value on the interval [0,1) corresponding to the discount parameter in the two parameter CRP prior.
`nUpdates`	An integer giving the number of merge-split proposals before returning. This has the effect of thinning the Markov chain.
`selectionWeights`	A matrix or data frame whose first two columns are the unique pairs of data indices, along with a column of weights representing how likely each pair is to be selected at the beginning of each merge-split update.

Value

partition: An integer vector giving the updated partition encoded using cluster labels.
accept: The acceptance rate of the Metropolis-Hastings proposals, i.e. the number accepted proposals divided by nUpdates.

References

Dahl, D. B. (2003). An improved merge-split sampler for conjugate Dirichlet process mixture models. Technical Report, 1, 086. Jain, S., & Neal, R. M. (2004). A split-merge Markov chain Monte Carlo procedure for the Dirichlet process mixture model. Journal of computational and Graphical Statistics, 13(1), 158-182.

Examples


# Neal (2000) model and data
nealData <- c(-1.48, -1.40, -1.16, -1.08, -1.02, 0.14, 0.51, 0.53, 0.78)
mkLogPosteriorPredictiveDensity <- function(data = nealData,
                                            sigma2 = 0.1^2,
                                            mu0 = 0,
                                            sigma02 = 1) {
  function(i, subset) {
    posteriorVariance <- 1 / ( 1/sigma02 + length(subset)/sigma2 )
    posteriorMean <- posteriorVariance * ( mu0/sigma02 + sum(data[subset])/sigma2 )
    posteriorPredictiveSD <- sqrt(posteriorVariance + sigma2)
    dnorm(data[i], posteriorMean, posteriorPredictiveSD, log=TRUE)
  }
}

logPostPredict <- mkLogPosteriorPredictiveDensity()

nSamples <- 1000L
partitions <- matrix(0, nrow = nSamples, ncol = length(nealData))
accept <- 0
for ( i in 2:nSamples ) {
  ms <- seqAllocatedMergeSplit(partitions[i-1,],
                               logPostPredict,
                               t = 1,
                               mass = 1.0,
                               nUpdates = 2)
  partitions[i,] <- ms$partition
  accept <- accept + ms$accept
}

accept / nSamples # M-H acceptance rate

# convergence and mixing diagnostics
nSubsets <- apply(partitions, 1, function(x) length(unique(x)))
mean(nSubsets)
sum(acf(nSubsets)$acf)-1   # Autocorrelation time

entropy <- apply(partitions, 1, partitionEntropy)
plot.ts(entropy)

[Package sams version 0.4.3 Index]