topTexts {tosca}R Documentation

Get The IDs Of The Most Representive Texts

Description

The function extracts the text IDs belonging to the texts with the highest relative or absolute number of words per topic.

Usage

topTexts(
  ldaresult,
  ldaID,
  limit = 20L,
  rel = TRUE,
  select = 1:nrow(ldaresult$document_sums),
  tnames,
  minlength = 30L
)

Arguments

ldaresult

LDA result

ldaID

Vector of text IDs

limit

Integer: Number of text IDs per topic.

rel

Logical: Should be the relative frequency be used?

select

Which topics should be returned?

tnames

Names of the selected topics

minlength

Minimal total number of words a text must have to be included

Value

Matrix of text IDs.

Examples

texts <- list(A="Give a Man a Fish, and You Feed Him for a Day.
Teach a Man To Fish, and You Feed Him for a Lifetime",
B="So Long, and Thanks for All the Fish",
C="A very able manipulative mathematician, Fisher enjoys a real mastery
in evaluating complicated multiple integrals.")

corpus <- textmeta(meta=data.frame(id=c("A", "B", "C", "D"),
title=c("Fishing", "Don't panic!", "Sir Ronald", "Berlin"),
date=c("1885-01-02", "1979-03-04", "1951-05-06", "1967-06-02"),
additionalVariable=1:4, stringsAsFactors=FALSE), text=texts)

corpus <- cleanTexts(corpus)
wordlist <- makeWordlist(corpus$text)
ldaPrep <- LDAprep(text=corpus$text, vocab=wordlist$words)

LDA <- LDAgen(documents=ldaPrep, K = 3L, vocab=wordlist$words, num.words=3)
topTexts(ldaresult=LDA, ldaID=c("A","B","C"), limit = 1L, minlength=2)

[Package tosca version 0.3-2 Index]