GetProbableTerms {textmineR} | R Documentation |
Get cluster labels using a "more probable" method of terms
Description
Function extracts probable terms from a set of documents. Probable here implies more probable than in a corpus overall.
Usage
GetProbableTerms(docnames, dtm, p_terms = NULL)
Arguments
docnames |
A character vector of rownames of dtm for set of documents |
dtm |
A document term matrix of class |
p_terms |
If not NULL (the default), a numeric vector representing the probability of each term in the corpus whose names correspond to colnames(dtm). |
Value
Returns a numeric vector of the format p_terms. The entries of the vectors correspond to the difference in the probability of drawing a term from the set of documents given by docnames and the probability of drawing that term from the corpus overall (p_terms).
Examples
# Load a pre-formatted dtm and topic model
data(nih_sample_topic_model)
data(nih_sample_dtm)
# documents with a topic proportion of .25 or higher for topic 2
mydocs <- rownames(nih_sample_topic_model$theta)[ nih_sample_topic_model$theta[ , 2 ] >= 0.25 ]
term_probs <- Matrix::colSums(nih_sample_dtm) / sum(Matrix::colSums(nih_sample_dtm))
GetProbableTerms(docnames = mydocs, dtm = nih_sample_dtm, p_terms = term_probs)
[Package textmineR version 3.0.5 Index]