R: The Topic SCORE algorithm

topic_score {TopicScore}

R Documentation

The Topic SCORE algorithm

This function obtains the word-topic matrix A from the word-document matrix X through the Topic SCORE algorithm.

topic_score(K, X, K0, m, Mquantile = 0, scatterplot = FALSE,
  num_restart = 1, seed = NULL)

`K`	The number of topics.
`X`	The p-by-n word-document matrix, with each column being a distribution over a fixed set of vocabulary. This matrix can be of class `simple_triplet_matrix` defined in slam package, or any other class that can be transformed to class `dgRMatrix` defined in Matrix package through `as` function in methods package.
`K0`	The number of greedy search steps in vertex hunting. If the value is missing it will be set to ceiling(1.5*K).
`m`	The number of centers in the kmeans step in vertex hunting. If the value is missing it will be set to 10*K.
`Mquantile`	The percentage of the quantile of the diagonal entries of matrix M, which is used to upper truncate the diagonal entries of matirx M. When it's 0, it will degenerate the case when there is no normalization. When it's 1, it means there is no truncation. Default is 0.
`scatterplot`	Whether a scatterplot of rows of R will be generated.
`num_restart`	The number of random restart in the kmeans step in vertex hunting. Default is 1.
`seed`	The random seed. Default value is NULL.

A list containing

A_hat: The estimated p-by-K word-topic matrix.
R: The p-by-(K-1) left singular vector ratios matrix.
V: The K-by-(K-1) vertices matrix, with each row being a vertex found through the vertex hunting algorithm in the simplex formed by the rows of R.
Pi: The p-by-K convex combinations matrix, with each row being the convex combination coefficients of a row of R using V as vertices.
theta: The K0-by-(K-1) matrix of K0 potential vertices found in the greedy step of the vertex hunting algorithm.

Minzhe Wang

Ke, Z. T., & Wang, M. (2017). A new SVD approach to optimal topic estimation. arXiv preprint arXiv:1704.07016.

data("AP")
K <- 3
tscore_obj <- topic_score(K, AP)

# Visualize the result
plot(tscore_obj$R[,1], tscore_obj$R[,2])

[Package TopicScore version 0.0.1 Index]