ldaPrototype-package {ldaPrototype} | R Documentation |
ldaPrototype: Prototype of Multiple Latent Dirichlet Allocation Runs
Description
Determine a Prototype from a number of runs of Latent Dirichlet
Allocation (LDA) measuring its similarities with S-CLOP: A procedure to select
the LDA run with highest mean pairwise similarity, which is measured by S-CLOP
(Similarity of multiple sets by Clustering with Local Pruning), to all other
runs. LDA runs are specified by its assignments leading to estimators for
distribution parameters. Repeated runs lead to different results, which we
encounter by choosing the most representative LDA run as prototype.
For bug reports and feature requests please use the issue tracker:
https://github.com/JonasRieger/ldaPrototype/issues. Also have a look at
the (detailed) example at https://github.com/JonasRieger/ldaPrototype.
Data
reuters
Example Dataset (91 articles from Reuters) for testing.
Constructor
LDA
LDA objects used in this package.
as.LDARep
LDARep objects.
as.LDABatch
LDABatch objects.
Getter
getTopics
Getter for LDA
objects.
getJob
Getter for LDARep
and LDABatch
objects.
getSimilarity
Getter for TopicSimilarity
objects.
getSCLOP
Getter for PrototypeLDA
objects.
getPrototype
Determine the Prototype LDA.
Performing multiple LDAs
LDARep
Performing multiple LDAs locally (using parallelization).
LDABatch
Performing multiple LDAs on Batch Systems.
Calculation Steps (Workflow) to determine the Prototype LDA
mergeTopics
Merge topic matrices from multiple LDAs.
jaccardTopics
Calculate topic similarities using the Jaccard coefficient (see Similarity Measures for other possible measures).
dendTopics
Create a dendrogram from topic similarities.
SCLOP
Determine various S-CLOP values.
pruneSCLOP
Prune TopicDendrogram
objects.
Similarity Measures
cosineTopics
Cosine Similarity.
jaccardTopics
Jaccard Coefficient.
jsTopics
Jensen-Shannon Divergence.
rboTopics
rank-biased overlap.
Shortcuts
getPrototype
Shortcut which includes all calculation steps.
LDAPrototype
Shortcut which performs multiple LDAs and
determines their Prototype.
Author(s)
Maintainer: Jonas Rieger jonas.rieger@tu-dortmund.de (ORCID)
References
Rieger, Jonas (2020). "ldaPrototype: A method in R to get a Prototype of multiple Latent Dirichlet Allocations". Journal of Open Source Software, 5(51), 2181, doi: 10.21105/joss.02181.
Rieger, Jonas, Jörg Rahnenführer and Carsten Jentsch (2020). "Improving Latent Dirichlet Allocation: On Reliability of the Novel Method LDAPrototype". In: Natural Language Processing and Information Systems, NLDB 2020. LNCS 12089, pp. 118–125, doi: 10.1007/978-3-030-51310-8_11.
Rieger, Jonas, Lars Koppers, Carsten Jentsch and Jörg Rahnenführer (2020). "Improving Reliability of Latent Dirichlet Allocation by Assessing Its Stability using Clustering Techniques on Replicated Runs". arXiv 2003.04980, URL https://arxiv.org/abs/2003.04980.
See Also
Useful links:
Report bugs at https://github.com/JonasRieger/ldaPrototype/issues