deep_mou_gibbs {deepMOU} | R Documentation |
Deep Mixture of Unigrams
Description
Performs parameter estimation by means of Gibbs sampling and cluster allocation for the Deep Mixture of Unigrams.
Usage
deep_mou_gibbs(x, k, g, n_it = 500, seed_choice = 1, burn_in = 200)
Arguments
x |
Document-term matrix describing the frequency of terms that occur in a collection of documents. Rows correspond to documents in the collection and columns correspond to terms. |
k |
Number of clusters/groups at the top layer. |
g |
Number of clusters at the bottom layer. |
n_it |
Number of Gibbs steps. |
seed_choice |
Set seed for reproducible results. |
burn_in |
Number of initial Gibbs samples to be discarded and not included in the computation of final estimates. |
Details
Starting from the data matrix x
, the Deep Mixture of Unigrams is fitted
and k
clusters are obtained.
The algorithm for the estimation of the parameters is the Gibbs sampling.
In particular, the function assigns initial values to all the parameters to be estimated. Then n_it
samples for the parameters are obtained using
conditional distributions on all the other parameters. The final estimates are obtained by averaging the samples given that initial burn_in
samples are
discarded. Clustering is eventually performed by maximizing the posterior distribution of the latent variables.
For further details see the references.
Value
A list containing the following elements:
x |
The data matrix. |
clusters |
the clustering labels. |
k |
the number of clusters at the top layer. |
g |
the number of clusters at the bottom layer. |
numobs |
the sample size. |
p |
the vocabulary size. |
z1 |
the allocation variables at the top layer. |
z2 |
the allocation variables at the bottom layer. |
Alpha |
the estimates of Alpha parameters. |
Beta |
the estimates of the Beta parameters. |
pi_hat |
estimated probabilities of belonging to the |
pi_hat_2 |
estimated probabilities of belonging to the |
References
Viroli C, Anderlucci L (2020). "Deep mixtures of Unigrams for uncovering topics in textual data." Statistics and Computing, pp. 1-18. doi: 10.1007/s11222-020-09989-9.
Examples
# Load the CNAE2 dataset
data("CNAE2")
# Perform parameter estimation and clustering, very few iterations used for this example
deep_CNAE2 = deep_mou_gibbs(x = CNAE2, k = 2, g = 2, n_it = 5, burn_in = 2)
# Shows cluster labels to documents
deep_CNAE2$clusters