R: Function to validate the fit of the LDA model

intruderWords {tosca}

R Documentation

Function to validate the fit of the LDA model

Description

This function validates a LDA result by presenting a mix of words from a topic and intruder words to a human user, who has to identity them.

Usage

intruderWords(
  beta = NULL,
  byScore = TRUE,
  numTopwords = 30L,
  numIntruder = 1L,
  numOutwords = 5L,
  noTopic = TRUE,
  printSolution = FALSE,
  oldResult = NULL,
  test = FALSE,
  testinput = NULL
)

Arguments

`beta`	A matrix of word-probabilities or frequency table for the topics (e.g. the `topics` matrix from the `LDAgen` result). Each row is a topic, each column a word. The rows will be divided by the row sums, if they are not 1.
`byScore`	Logical: Should the score of `top.topic.words` from the `lda` package be used?
`numTopwords`	The number of topwords to be used for the intruder words
`numIntruder`	Intended number of intruder words. If `numIntruder` is a integer vector, the number would be sampled for each topic.
`numOutwords`	Integer: Number of words per topic, including the intruder words.
`noTopic`	Logical: Is `x` input allowed to mark nonsense topics?
`printSolution`	tba
`oldResult`	Result object from an unfinished run of `intruderWords`. If oldResult is used, all other parameter will be ignored.
`test`	Logical: Enables test mode
`testinput`	Input for function tests

Value

Object of class IntruderWords. List of 7

`result`	Matrix of 3 columns. Each row represents one topic. All values are 0 if the topic did not run before. `numIntruder` (1. column) gives the number of intruder words inputated in this topic, `missIntruder` (2. column) the number of the intruder words which were not found by the coder and `falseIntruder` (3. column) the number of the words choosen by the coder which were no intruder.
`beta`	Parameter of the function call
`byScore`	Parameter of the function call
`numTopwords`	Parameter of the function call
`numIntruder`	Parameter of the function call
`numOutwords`	Parameter of the function call
`noTopic`	Parameter of the function call

References

Chang, Jonathan and Sean Gerrish and Wang, Chong and Jordan L. Boyd-graber and David M. Blei. Reading Tea Leaves: How Humans Interpret Topic Models. Advances in Neural Information Processing Systems, 2009.

Examples

## Not run: 
data(politics)
poliClean <- cleanTexts(politics)
words10 <- makeWordlist(text=poliClean$text)
words10 <- words10$words[words10$wordtable > 10]
poliLDA <- LDAprep(text=poliClean$text, vocab=words10)
LDAresult <- LDAgen(documents=poliLDA, K=10, vocab=words10)
intruder <- intruderWords(beta=LDAresult$topics)
## End(Not run)

[Package tosca version 0.3-2 Index]