R: Top Words per Topic

topWords {tosca}

R Documentation

Top Words per Topic

Description

Determines the top words per topic as top.topic.words do. In addition, it is possible to request the values that are taken for determining the top words per topic. Therefore, the function importance is used, which also can be called independently.

Usage

topWords(topics, numWords = 1, byScore = TRUE, epsilon = 1e-05, values = FALSE)

importance(topics, epsilon = 1e-05)

Arguments

`topics`	`named matrix`: The counts of vocabularies (column wise) in topics (row wise).
`numWords`	`integer(1)`: The number of requested top words per topic.
`byScore`	`logical(1)`: Should the values that are taken for determining the top words per topic be calculated by the function `importance` (`TRUE`) or should the absolute counts be considered (`FALSE`)?
`epsilon`	`numeric(1)`: Small number to add to logarithmic calculations to overcome the issue of determining `log(0)`.
`values`	`logical(1)`: Should the values that are taken for determining the top words per topic be returned?

Value

Matrix of top words or, if value is TRUE a list of matrices with entries word and val.

Examples

texts <- list(
 A = "Give a Man a Fish, and You Feed Him for a Day.
      Teach a Man To Fish, and You Feed Him for a Lifetime",
 B = "So Long, and Thanks for All the Fish",
 C = "A very able manipulative mathematician, Fisher enjoys a real mastery
      in evaluating complicated multiple integrals.")

corpus <- textmeta(meta = data.frame(id = c("A", "B", "C", "D"),
  title = c("Fishing", "Don't panic!", "Sir Ronald", "Berlin"),
  date = c("1885-01-02", "1979-03-04", "1951-05-06", "1967-06-02"),
  additionalVariable = 1:4, stringsAsFactors = FALSE), text = texts)

corpus <- cleanTexts(corpus)
wordlist <- makeWordlist(corpus$text)
ldaPrep <- LDAprep(text = corpus$text, vocab = wordlist$words)

LDA <- LDAgen(documents = ldaPrep, K = 3L, vocab = wordlist$words, num.words = 3)
topWords(LDA$topics)

importance(LDA$topics)

[Package tosca version 0.3-2 Index]