DPMCMC {DIRECT} | R Documentation |
Dirichlet Process-Based Markov Chain Monte Carlo (MCMC) Sampler for Mixture Model-Based Clustering
Description
The MCMC sampler for DIRECT
. In each MCMC iteration, the function updates cluster memberships for all items, allowing for changes in the number of clusters (mixture components). This update implements a Metropolis-Hastings (MH) sampler developed in Fu et al. (2013), and an MH sampler developed in Neal (2000). It also updates parameters specific to each mixture components via MH sampling. Parameters of interest include the mean vector and standard deviations of the three types of variability. Additionally, it updates \alpha
, the concentration parameter in the Dirichlet-process prior, allowing for Gibbs (Escobar and West, 1995) and MH sampling.
Usage
DPMCMC(file.mcmc.cs, file.mcmc.pars, file.mcmc.probs, file.size,
data, SKIP, nTime, times, c.curr, par.prior,
PAR.INIT = FALSE,
sdWICluster.curr = 0.5, sdTSampling.curr = 0.5,
sdResidual.curr = 0.5, alpha.curr,
alpha.prior.shape, alpha.prior.rate, sd.prop,
nIter, burn.in, step.size, nRepeat = 1, nResample, seed.value,
RNORM.METHOD = c("chol", "eigen", "svd"),
SAMPLE.C = c("FRBT", "Neal"),
PRIOR.MODEL = c("none", "OU", "BM", "BMdrift"),
ALPHA.METHOD = c("Gibbs", "MH"),
OUTPUT.CLUST.SIZE = FALSE, PRINT = FALSE)
Arguments
file.mcmc.cs |
A character string in quotation marks indicating the output filename for cluster memberships and |
file.mcmc.pars |
A character string in quotation marks indicating the output filename for MCMC samples of parameters specific to mixture components. |
file.mcmc.probs |
A character string in quotation marks indicating the output filename for posterior allocation probability matrices from the resampling step. |
file.size |
A character string in quotation marks indicating the output filename for cluster sizes. |
data |
An |
SKIP |
Number of columns in |
nTime |
Number of time points (or experimental conditions). |
times |
An integer vector of length |
c.curr |
An integer vector of length |
par.prior |
A list that contains parameters of the prior distributions. It has the following format: |
PAR.INIT |
Logical value. Generate initial values for the standard deviations of the three types of variability if TRUE. Use the input values otherwise. |
sdWICluster.curr |
A positive scalar. Initial value of the standard deviation of the within-cluster variability. Ignored if |
sdTSampling.curr |
A positive scalar. Initial value of the standard deviation of the variability across time points. Ignored if |
sdResidual.curr |
A positive scalar. Initial value of the standard deviation of the residual variability (i.e., variability across replicates). Ignored if |
alpha.curr |
A positive scalar. Initial value of |
alpha.prior.shape |
A positive scalar. The shape parameter in the beta prior for |
alpha.prior.rate |
A positive scalar. The rate parameter in the beta prior for |
sd.prop |
A list that contains standard deviations in the proposal distributions for some key parameters. It has the following format: |
nIter |
The number of MCMC iterations. |
burn.in |
A value in (0,1) indicating the percentage of the MCMC iterations to be used as burn-in and be discarded in posterior inference. |
step.size |
An integer indicating the number of MCMC iterations to be skipped between two recorded MCMC samples. |
nRepeat |
An integer |
nResample |
An integer |
seed.value |
A positive value used in random number generation. |
RNORM.METHOD |
Method to compute the determinant of the covariance matrix in the calculation of the multivariate normal density. Required. Method choices are: "chol" for Choleski decomposition, "eigen" for eigenvalue decomposition, and "svd" for singular value decomposition. |
SAMPLE.C |
Method to update cluster memberships. Required. Method choices are: "FRBT" for the Metropolis-Hastings sampler based on a discrete uniform proposal distribution developed in Fu, Russell, Bray and Tavare, and "Neal" for the Metropolis-Hastings sampler developed in Neal (2000). |
PRIOR.MODEL |
Model to generate realizations of the mean vector of a mixture component. Required. Choices are: "none" for not assuming a stochastic process and using a zero vector, "OU" for an Ornstein-Uhlenbeck process (a.k.a. the mean-reverting process), "BM" for a Brown motion (without drift), and "BMdrift" for a Brownian motion with drift. |
ALPHA.METHOD |
Method to update |
OUTPUT.CLUST.SIZE |
If TRUE, output cluster sizes in MCMC iterations into an external file *_mcmc_size.out. |
PRINT |
If TRUE, print intermediate values during an MCMC run onto the screen. Used for debugging for small data sets. |
Details
The MCMC sampling step in DIRECT
is accomplished with DPMCMC
. DPMCMC
generates MCMC samples of assignments to mixture components (number of components implicitly generated; written into external file *_mcmc_cs.out) and component-specific parameters (written into external file *_mcmc_pars.out), which include mean vectors and standard deviations of three types of variability.
Value
At least two files are generated by DPMCMC
and placed under the current working directory:
*_mcmc_cs.out: Generated from MCMC sampling. Each row contains an MCMC sample of assignments of items to mixture components, or cluster memberships if a component is defined as a cluster, as well as
\alpha
, the concentration parameter in the Dirichlet-process prior.*_mcmc_pars.out: Generated from MCMC sampling. Each row contains an MCMC sample of parameters specific to a mixture component. Multiple rows may come from the same MCMC iteration.
If argument OUTPUT.CLUST.SIZE=TRUE
, an additional file *_mcmc_size.out is also generated, which contains the cluster sizes of each recorded MCMC sample.
Author(s)
Audrey Q. Fu
References
Escobar, M. D. and West, M. (1995) Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association, 90: 577-588.
Fu, A. Q., Russell, S., Bray, S. and Tavare, S. (2013) Bayesian clustering of replicated time-course gene expression data with weak signals. The Annals of Applied Statistics. 7(3) 1334-1361.
Neal, R. M. (2000) Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9: 249-265.
See Also
DIRECT
, which calls DPMCMC
.
Examples
## See example in DIRECT.