sim_LDA_data {LDATS} | R Documentation |
Simulate LDA data from an LDA structure given parameters
Description
For a given set of parameters alpha
and Beta
and
document-specific total word counts, simulate a document-by-term matrix.
Additional structuring variables (the numbers of topics (k),
documents (M), terms (V)) are inferred from input objects.
Usage
sim_LDA_data(N, Beta, alpha = NULL, Theta = NULL, seed = NULL)
Arguments
N |
A vector of document sizes (total word counts). Must be integer conformable. Is used to infer the total number of documents. |
Beta |
|
alpha |
Single positive numeric value for the Dirichlet distribution
parameter defining topics within documents. To specifically define
document topic probabilities, use |
Theta |
|
seed |
Input to |
Value
A document-by-term matrix
of counts (dim: M x V).
Examples
N <- c(10, 22, 15, 31)
alpha <- 1.2
Beta <- matrix(c(0.1, 0.1, 0.8, 0.2, 0.6, 0.2), 2, 3, byrow = TRUE)
sim_LDA_data(N, Beta, alpha = alpha)
Theta <- matrix(c(0.2, 0.8, 0.8, 0.2, 0.5, 0.5, 0.9, 0.1), 4, 2,
byrow = TRUE)
sim_LDA_data(N, Beta, Theta = Theta)