clusterTopics {tosca}R Documentation

Cluster Analysis

Description

This function makes a cluster analysis using the Hellinger distance.

Usage

clusterTopics(
  ldaresult,
  file,
  tnames = NULL,
  method = "average",
  width = 30,
  height = 15,
  ...
)

Arguments

ldaresult

The result of a function call LDAgen - alternatively the corresponding matrix result$topics

file

File for the dendogram pdf.

tnames

Character vector as label for the topics.

method

Method statement from hclust

width

Grafical parameter for pdf output. See pdf

height

Grafical parameter for pdf output. See pdf

...

Additional parameter for plot

Details

This function is useful to analyze topic similarities and while evaluating the right number of topics of LDAs.

Value

A dendogram as pdf and a list containing

dist

A distance matrix

clust

The result from hclust

Examples


texts <- list(A="Give a Man a Fish, and You Feed Him for a Day.
Teach a Man To Fish, and You Feed Him for a Lifetime",
B="So Long, and Thanks for All the Fish",
C="A very able manipulative mathematician, Fisher enjoys a real mastery
in evaluating complicated multiple integrals.")

corpus <- textmeta(meta=data.frame(id=c("A", "B", "C", "D"),
title=c("Fishing", "Don't panic!", "Sir Ronald", "Berlin"),
date=c("1885-01-02", "1979-03-04", "1951-05-06", "1967-06-02"),
additionalVariable=1:4, stringsAsFactors=FALSE), text=texts)

corpus <- cleanTexts(corpus)
wordlist <- makeWordlist(corpus$text)
ldaPrep <- LDAprep(text=corpus$text, vocab=wordlist$words)

LDA <- LDAgen(documents=ldaPrep, K = 3L, vocab=wordlist$words, num.words=3)
clusterTopics(ldaresult=LDA)


[Package tosca version 0.3-2 Index]