topWords {tosca}R Documentation

Top Words per Topic

Description

Determines the top words per topic as top.topic.words do. In addition, it is possible to request the values that are taken for determining the top words per topic. Therefore, the function importance is used, which also can be called independently.

Usage

topWords(topics, numWords = 1, byScore = TRUE, epsilon = 1e-05, values = FALSE)

importance(topics, epsilon = 1e-05)

Arguments

topics

named matrix: The counts of vocabularies (column wise) in topics (row wise).

numWords

integer(1): The number of requested top words per topic.

byScore

logical(1): Should the values that are taken for determining the top words per topic be calculated by the function importance (TRUE) or should the absolute counts be considered (FALSE)?

epsilon

numeric(1): Small number to add to logarithmic calculations to overcome the issue of determining log(0).

values

logical(1): Should the values that are taken for determining the top words per topic be returned?

Value

Matrix of top words or, if value is TRUE a list of matrices with entries word and val.

Examples

texts <- list(
 A = "Give a Man a Fish, and You Feed Him for a Day.
      Teach a Man To Fish, and You Feed Him for a Lifetime",
 B = "So Long, and Thanks for All the Fish",
 C = "A very able manipulative mathematician, Fisher enjoys a real mastery
      in evaluating complicated multiple integrals.")

corpus <- textmeta(meta = data.frame(id = c("A", "B", "C", "D"),
  title = c("Fishing", "Don't panic!", "Sir Ronald", "Berlin"),
  date = c("1885-01-02", "1979-03-04", "1951-05-06", "1967-06-02"),
  additionalVariable = 1:4, stringsAsFactors = FALSE), text = texts)

corpus <- cleanTexts(corpus)
wordlist <- makeWordlist(corpus$text)
ldaPrep <- LDAprep(text = corpus$text, vocab = wordlist$words)

LDA <- LDAgen(documents = ldaPrep, K = 3L, vocab = wordlist$words, num.words = 3)
topWords(LDA$topics)

importance(LDA$topics)

[Package tosca version 0.3-2 Index]