lda-package {lda}R Documentation

Collapsed Gibbs Sampling Methods for Topic Models

Description

Implements latent Dirichlet allocation (LDA) and related models. This includes (but is not limited to) sLDA, corrLDA, and the mixed-membership stochastic blockmodel. Inference for all of these models is implemented via a fast collapsed Gibbs sampler written in C. Utility functions for reading/writing data typically used in topic models, as well as tools for examining posterior distributions are also included.

Details

The DESCRIPTION file:

Package: lda
Type: Package
Title: Collapsed Gibbs Sampling Methods for Topic Models
Version: 1.5.2
Date: 2024-04-25
Author: Jonathan Chang
Maintainer: Santiago Olivella <olivella@unc.edu>
Description: Implements latent Dirichlet allocation (LDA) and related models. This includes (but is not limited to) sLDA, corrLDA, and the mixed-membership stochastic blockmodel. Inference for all of these models is implemented via a fast collapsed Gibbs sampler written in C. Utility functions for reading/writing data typically used in topic models, as well as tools for examining posterior distributions are also included.
License: LGPL (>= 2.1)
LazyLoad: yes
Imports: methods (>= 4.3.0)
Suggests: Matrix, reshape2, ggplot2 (>= 3.4.4), penalized, nnet
Depends: R (>= 4.3.0)

Index of help topics:

cora                    A subset of the Cora dataset of scientific
                        documents.
filter.words            Functions to manipulate text corpora in LDA
                        format.
lda-package             Collapsed Gibbs Sampling Methods for Topic
                        Models
lda.collapsed.gibbs.sampler
                        Functions to Fit LDA-type models
lexicalize              Generate LDA Documents from Raw Text
links.as.edgelist       Convert a set of links keyed on source to a
                        single list of edges.
newsgroup               A collection of newsgroup messages with
                        classes.
nubbi.collapsed.gibbs.sampler
                        Collapsed Gibbs Sampling for the Networks
                        Uncovered By Bayesian Inference (NUBBI) Model.
poliblog                A collection of political blogs with ratings.
predictive.distribution
                        Compute predictive distributions for fitted
                        LDA-type models.
predictive.link.probability
                        Use the RTM to predict whether a link exists
                        between two documents.
read.documents          Read LDA-formatted Document and Vocabulary
                        Files
rtm.collapsed.gibbs.sampler
                        Collapsed Gibbs Sampling for the Relational
                        Topic Model (RTM).
sampson                 Sampson monk data
slda.predict            Predict the response variable of documents
                        using an sLDA model.
top.topic.words         Get the Top Words and Documents in Each Topic
word.counts             Compute Summary Statistics of a Corpus

Author(s)

Jonathan Chang

Maintainer: Santiago Olivella <olivella@unc.edu>

Special thanks to the following for their reports and comments: Edo Airoldi, Jordan Boyd-Graber, Christopher E. Cramer, Andrew Dai, James Danowski, Khalid El-Arini, Roger Levy, Solomon Messing, Joerg Reichardt, Dmitriy Selivanov

References

Blei, David M. and Ng, Andrew and Jordan, Michael. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003.

See Also

Functions to fit models: lda.collapsed.gibbs.sampler slda.em mmsb.collapsed.gibbs.sampler nubbi.collapsed.gibbs.sampler rtm.collapsed.gibbs.sampler

Functions to read/create corpora: lexicalize read.documents read.vocab

Functions to manipulate corpora: concatenate.documents filter.words shift.word.indices links.as.edgelist

Functions to compute summary statistics on corpora: word.counts document.lengths

Functions which use the output of fitted models: predictive.distribution top.topic.words top.topic.documents predictive.link.probability

Included data sets: cora poliblog sampson

Examples

## See demos for the following three common use cases:

## Not run: demo(lda)

## Not run: demo(slda)

## Not run: demo(mmsb)

## Not run: demo(rtm)

[Package lda version 1.5.2 Index]