topic_score {TopicScore} | R Documentation |
The Topic SCORE algorithm
Description
This function obtains the word-topic matrix A from the word-document matrix X through the Topic SCORE algorithm.
Usage
topic_score(K, X, K0, m, Mquantile = 0, scatterplot = FALSE,
num_restart = 1, seed = NULL)
Arguments
K |
The number of topics. |
X |
The p-by-n word-document matrix, with each column being a distribution over a fixed set of vocabulary.
This matrix can be of class |
K0 |
The number of greedy search steps in vertex hunting. If the value is missing it will be set to ceiling(1.5*K). |
m |
The number of centers in the kmeans step in vertex hunting. If the value is missing it will be set to 10*K. |
Mquantile |
The percentage of the quantile of the diagonal entries of matrix M, which is used to upper truncate the diagonal entries of matirx M. When it's 0, it will degenerate the case when there is no normalization. When it's 1, it means there is no truncation. Default is 0. |
scatterplot |
Whether a scatterplot of rows of R will be generated. |
num_restart |
The number of random restart in the kmeans step in vertex hunting. Default is 1. |
seed |
The random seed. Default value is NULL. |
Value
A list containing
- A_hat
The estimated p-by-K word-topic matrix.
- R
The p-by-(K-1) left singular vector ratios matrix.
- V
The K-by-(K-1) vertices matrix, with each row being a vertex found through the vertex hunting algorithm in the simplex formed by the rows of R.
- Pi
The p-by-K convex combinations matrix, with each row being the convex combination coefficients of a row of R using V as vertices.
- theta
The K0-by-(K-1) matrix of K0 potential vertices found in the greedy step of the vertex hunting algorithm.
Author(s)
Minzhe Wang
References
Ke, Z. T., & Wang, M. (2017). A new SVD approach to optimal topic estimation. arXiv preprint arXiv:1704.07016.
Examples
data("AP")
K <- 3
tscore_obj <- topic_score(K, AP)
# Visualize the result
plot(tscore_obj$R[,1], tscore_obj$R[,2])