R: Cluster Analysis

clusterTopics {tosca}

R Documentation

Cluster Analysis

Description

This function makes a cluster analysis using the Hellinger distance.

Usage

clusterTopics(
  ldaresult,
  file,
  tnames = NULL,
  method = "average",
  width = 30,
  height = 15,
  ...
)

Arguments

`ldaresult`	The result of a function call `LDAgen` - alternatively the corresponding matrix `result$topics`
`file`	File for the dendogram pdf.
`tnames`	Character vector as label for the topics.
`method`	Method statement from `hclust`
`width`	Grafical parameter for pdf output. See `pdf`
`height`	Grafical parameter for pdf output. See `pdf`
`...`	Additional parameter for `plot`

Details

This function is useful to analyze topic similarities and while evaluating the right number of topics of LDAs.

Value

A dendogram as pdf and a list containing

`dist`	A distance matrix
`clust`	The result from `hclust`

Examples


texts <- list(A="Give a Man a Fish, and You Feed Him for a Day.
Teach a Man To Fish, and You Feed Him for a Lifetime",
B="So Long, and Thanks for All the Fish",
C="A very able manipulative mathematician, Fisher enjoys a real mastery
in evaluating complicated multiple integrals.")

corpus <- textmeta(meta=data.frame(id=c("A", "B", "C", "D"),
title=c("Fishing", "Don't panic!", "Sir Ronald", "Berlin"),
date=c("1885-01-02", "1979-03-04", "1951-05-06", "1967-06-02"),
additionalVariable=1:4, stringsAsFactors=FALSE), text=texts)

corpus <- cleanTexts(corpus)
wordlist <- makeWordlist(corpus$text)
ldaPrep <- LDAprep(text=corpus$text, vocab=wordlist$words)

LDA <- LDAgen(documents=ldaPrep, K = 3L, vocab=wordlist$words, num.words=3)
clusterTopics(ldaresult=LDA)

[Package tosca version 0.3-2 Index]