| ldaPrototype-package {ldaPrototype} | R Documentation |
ldaPrototype: Prototype of Multiple Latent Dirichlet Allocation Runs
Description
Determine a Prototype from a number of runs of Latent Dirichlet
Allocation (LDA) measuring its similarities with S-CLOP: A procedure to select
the LDA run with highest mean pairwise similarity, which is measured by S-CLOP
(Similarity of multiple sets by Clustering with Local Pruning), to all other
runs. LDA runs are specified by its assignments leading to estimators for
distribution parameters. Repeated runs lead to different results, which we
encounter by choosing the most representative LDA run as prototype.
For bug reports and feature requests please use the issue tracker:
https://github.com/JonasRieger/ldaPrototype/issues. Also have a look at
the (detailed) example at https://github.com/JonasRieger/ldaPrototype.
Data
reuters Example Dataset (91 articles from Reuters) for testing.
Constructor
LDA LDA objects used in this package.
as.LDARep LDARep objects.
as.LDABatch LDABatch objects.
Getter
getTopics Getter for LDA objects.
getJob Getter for LDARep and LDABatch objects.
getSimilarity Getter for TopicSimilarity objects.
getSCLOP Getter for PrototypeLDA objects.
getPrototype Determine the Prototype LDA.
Performing multiple LDAs
LDARep Performing multiple LDAs locally (using parallelization).
LDABatch Performing multiple LDAs on Batch Systems.
Calculation Steps (Workflow) to determine the Prototype LDA
mergeTopics Merge topic matrices from multiple LDAs.
jaccardTopics Calculate topic similarities using the Jaccard coefficient (see Similarity Measures for other possible measures).
dendTopics Create a dendrogram from topic similarities.
SCLOP Determine various S-CLOP values.
pruneSCLOP Prune TopicDendrogram objects.
Similarity Measures
cosineTopics Cosine Similarity.
jaccardTopics Jaccard Coefficient.
jsTopics Jensen-Shannon Divergence.
rboTopics rank-biased overlap.
Shortcuts
getPrototype Shortcut which includes all calculation steps.
LDAPrototype Shortcut which performs multiple LDAs and
determines their Prototype.
Author(s)
Maintainer: Jonas Rieger jonas.rieger@tu-dortmund.de (ORCID)
References
Rieger, Jonas (2020). "ldaPrototype: A method in R to get a Prototype of multiple Latent Dirichlet Allocations". Journal of Open Source Software, 5(51), 2181, doi: 10.21105/joss.02181.
Rieger, Jonas, Jörg Rahnenführer and Carsten Jentsch (2020). "Improving Latent Dirichlet Allocation: On Reliability of the Novel Method LDAPrototype". In: Natural Language Processing and Information Systems, NLDB 2020. LNCS 12089, pp. 118–125, doi: 10.1007/978-3-030-51310-8_11.
Rieger, Jonas, Lars Koppers, Carsten Jentsch and Jörg Rahnenführer (2020). "Improving Reliability of Latent Dirichlet Allocation by Assessing Its Stability using Clustering Techniques on Replicated Runs". arXiv 2003.04980, URL https://arxiv.org/abs/2003.04980.
See Also
Useful links:
Report bugs at https://github.com/JonasRieger/ldaPrototype/issues