topTexts {tosca} | R Documentation |
Get The IDs Of The Most Representive Texts
Description
The function extracts the text IDs belonging to the texts with the highest relative or absolute number of words per topic.
Usage
topTexts(
ldaresult,
ldaID,
limit = 20L,
rel = TRUE,
select = 1:nrow(ldaresult$document_sums),
tnames,
minlength = 30L
)
Arguments
ldaresult |
LDA result |
ldaID |
Vector of text IDs |
limit |
Integer: Number of text IDs per topic. |
rel |
Logical: Should be the relative frequency be used? |
select |
Which topics should be returned? |
tnames |
Names of the selected topics |
minlength |
Minimal total number of words a text must have to be included |
Value
Matrix of text IDs.
Examples
texts <- list(A="Give a Man a Fish, and You Feed Him for a Day.
Teach a Man To Fish, and You Feed Him for a Lifetime",
B="So Long, and Thanks for All the Fish",
C="A very able manipulative mathematician, Fisher enjoys a real mastery
in evaluating complicated multiple integrals.")
corpus <- textmeta(meta=data.frame(id=c("A", "B", "C", "D"),
title=c("Fishing", "Don't panic!", "Sir Ronald", "Berlin"),
date=c("1885-01-02", "1979-03-04", "1951-05-06", "1967-06-02"),
additionalVariable=1:4, stringsAsFactors=FALSE), text=texts)
corpus <- cleanTexts(corpus)
wordlist <- makeWordlist(corpus$text)
ldaPrep <- LDAprep(text=corpus$text, vocab=wordlist$words)
LDA <- LDAgen(documents=ldaPrep, K = 3L, vocab=wordlist$words, num.words=3)
topTexts(ldaresult=LDA, ldaID=c("A","B","C"), limit = 1L, minlength=2)
[Package tosca version 0.3-2 Index]