textCluster {phm}R Documentation

Cluster a Term-Document Matrix

Description

Combine documents (columns) into k clusters that have texts that are most similar based on their text distance. Documents with no terms are assigned to the last cluster.

Usage

textCluster(tdm, k, mx = 100, md = 5 * k)

Arguments

tdm

A term document matrix with terms on the rows and documents on the columns.

k

A positive integer with the number of clusters needed

mx

Maximum number of times to iterate (default 100)

md

Maximum number of documents to use for the initial setup (default 5*k).

Value

A textcluster object with three items; cluster, centroids, and size, where cluster contains a vector indicating for each column in M what cluster they have been assigned to, centroids contains a matrix with each column the centroid of a cluster, and size a named vector with the size of each cluster.

Examples

M=matrix(c(0,1,0,2,0,10,0,14,12,0,8,0,1,0,1,0),4)
colnames(M)=1:4;rownames(M)=c("A","B","C","D")
textCluster(M,2)

[Package phm version 1.1.2 Index]