R: Get The IDs Of The Most Representive Texts

topTexts {tosca}

R Documentation

Get The IDs Of The Most Representive Texts

Description

The function extracts the text IDs belonging to the texts with the highest relative or absolute number of words per topic.

Usage

topTexts(
  ldaresult,
  ldaID,
  limit = 20L,
  rel = TRUE,
  select = 1:nrow(ldaresult$document_sums),
  tnames,
  minlength = 30L
)

Arguments

`ldaresult`	LDA result
`ldaID`	Vector of text IDs
`limit`	Integer: Number of text IDs per topic.
`rel`	Logical: Should be the relative frequency be used?
`select`	Which topics should be returned?
`tnames`	Names of the selected topics
`minlength`	Minimal total number of words a text must have to be included

Value

Matrix of text IDs.

Examples

texts <- list(A="Give a Man a Fish, and You Feed Him for a Day.
Teach a Man To Fish, and You Feed Him for a Lifetime",
B="So Long, and Thanks for All the Fish",
C="A very able manipulative mathematician, Fisher enjoys a real mastery
in evaluating complicated multiple integrals.")

corpus <- textmeta(meta=data.frame(id=c("A", "B", "C", "D"),
title=c("Fishing", "Don't panic!", "Sir Ronald", "Berlin"),
date=c("1885-01-02", "1979-03-04", "1951-05-06", "1967-06-02"),
additionalVariable=1:4, stringsAsFactors=FALSE), text=texts)

corpus <- cleanTexts(corpus)
wordlist <- makeWordlist(corpus$text)
ldaPrep <- LDAprep(text=corpus$text, vocab=wordlist$words)

LDA <- LDAgen(documents=ldaPrep, K = 3L, vocab=wordlist$words, num.words=3)
topTexts(ldaresult=LDA, ldaID=c("A","B","C"), limit = 1L, minlength=2)

[Package tosca version 0.3-2 Index]