textCluster {phm} | R Documentation |
Cluster a Term-Document Matrix
Description
Combine documents (columns) into k clusters that have texts that are most similar based on their text distance. Documents with no terms are assigned to the last cluster.
Usage
textCluster(tdm, k, mx = 100, md = 5 * k)
Arguments
tdm |
A term document matrix with terms on the rows and documents on the columns. |
k |
A positive integer with the number of clusters needed |
mx |
Maximum number of times to iterate (default 100) |
md |
Maximum number of documents to use for the initial setup (default
5* |
Value
A textcluster object with three items; cluster, centroids, and size,
where cluster contains a vector indicating for each column in M
what
cluster they have been assigned to, centroids contains a matrix with each
column the centroid of a cluster, and size a named vector with the size of
each cluster.
Examples
M=matrix(c(0,1,0,2,0,10,0,14,12,0,8,0,1,0,1,0),4)
colnames(M)=1:4;rownames(M)=c("A","B","C","D")
textCluster(M,2)
[Package phm version 1.1.2 Index]