clusters_by_doc_table {rainette} | R Documentation |
Returns the number of segment of each cluster for each source document
Description
Returns the number of segment of each cluster for each source document
Usage
clusters_by_doc_table(obj, clust_var = NULL, doc_id = NULL, prop = FALSE)
Arguments
obj |
a corpus, tokens or dtm object |
clust_var |
name of the docvar with the clusters |
doc_id |
docvar identifying the source document |
prop |
if TRUE, returns the percentage of each cluster by document |
Details
This function is only useful for previously segmented corpus. If doc_id
is NULL and there is a sement_source
docvar, it will be used instead.
See Also
Examples
require(quanteda)
corpus <- data_corpus_inaugural
corpus <- head(corpus, n = 10)
corpus <- split_segments(corpus)
tok <- tokens(corpus, remove_punct = TRUE)
tok <- tokens_remove(tok, stopwords("en"))
dtm <- dfm(tok, tolower = TRUE)
dtm <- dfm_trim(dtm, min_docfreq = 2)
res <- rainette(dtm, k = 3, min_segment_size = 15)
corpus$cluster <- cutree(res, k = 3)
clusters_by_doc_table(corpus, clust_var = "cluster", prop = TRUE)
[Package rainette version 0.3.1.1 Index]