intruderTopics {tosca} | R Documentation |
Function to validate the fit of the LDA model
Description
This function validates a LDA result by presenting a mix of topics and intruder topics to a human user, who has to identity them.
Usage
intruderTopics(
text = NULL,
beta = NULL,
theta = NULL,
id = NULL,
numIntruder = 1,
numOuttopics = 4,
byScore = TRUE,
minWords = 0L,
minOuttopics = 0L,
stopTopics = NULL,
printSolution = FALSE,
oldResult = NULL,
test = FALSE,
testinput = NULL
)
Arguments
text |
A list of texts (e.g. the text element of a |
beta |
A matrix of word-probabilities or frequency table for the topics (e.g. the |
theta |
A matrix of wordcounts per text and topic (e.g. the |
id |
Optional: character vector of text IDs that should be used for the function. Useful to start a inchoate coding task. |
numIntruder |
Intended number of intruder words. If |
numOuttopics |
tba Integer: Number of words per topic, including the intruder words |
byScore |
Logical: Should the score of |
minWords |
Integer: Minimum number of words for a choosen text. |
minOuttopics |
Integer: Minimal number of words a topic needs to be classified as a possible correct Topic. |
stopTopics |
Optional: Integer vector to deselect stopword topics for the coding task. |
printSolution |
Logical: If |
oldResult |
Result object from an unfinished run of |
test |
Logical: Enables test mode |
testinput |
Input for function tests |
Value
Object of class IntruderTopics
. List of 11
result |
Matrix of 3 columns. Each row represents one labeled text. |
beta |
Parameter of the function call |
theta |
Parameter of the function call |
id |
Charater Vector of IDs at the beginning |
byScore |
Parameter of the function call |
numIntruder |
Parameter of the function call |
numOuttopics |
Parameter of the function call |
minWords |
Parameter of the function call |
minOuttopics |
Parameter of the function call |
unusedID |
Character vector of unused text IDs for the next run |
stopTopics |
Parameter of the function call |
References
Chang, Jonathan and Sean Gerrish and Wang, Chong and Jordan L. Boyd-graber and David M. Blei. Reading Tea Leaves: How Humans Interpret Topic Models. Advances in Neural Information Processing Systems, 2009.
Examples
## Not run:
data(politics)
poliClean <- cleanTexts(politics)
words10 <- makeWordlist(text=poliClean$text)
words10 <- words10$words[words10$wordtable > 10]
poliLDA <- LDAprep(text=poliClean$text, vocab=words10)
LDAresult <- LDAgen(documents=poliLDA, K=10, vocab=words10)
intruder <- intruderTopics(text=politics$text, beta=LDAresult$topics,
theta=LDAresult$document_sums, id=names(poliLDA))
## End(Not run)