R: Get cluster labels using a "more probable" method of terms

GetProbableTerms {textmineR}

R Documentation

Get cluster labels using a "more probable" method of terms

Description

Function extracts probable terms from a set of documents. Probable here implies more probable than in a corpus overall.

Usage

GetProbableTerms(docnames, dtm, p_terms = NULL)

Arguments

`docnames`	A character vector of rownames of dtm for set of documents
`dtm`	A document term matrix of class `matrix` or `dgCMatrix`.
`p_terms`	If not NULL (the default), a numeric vector representing the probability of each term in the corpus whose names correspond to colnames(dtm).

Value

Returns a numeric vector of the format p_terms. The entries of the vectors correspond to the difference in the probability of drawing a term from the set of documents given by docnames and the probability of drawing that term from the corpus overall (p_terms).

Examples

# Load a pre-formatted dtm and topic model
data(nih_sample_topic_model)
data(nih_sample_dtm) 

# documents with a topic proportion of .25 or higher for topic 2
mydocs <- rownames(nih_sample_topic_model$theta)[ nih_sample_topic_model$theta[ , 2 ] >= 0.25 ] 

term_probs <- Matrix::colSums(nih_sample_dtm) / sum(Matrix::colSums(nih_sample_dtm))

GetProbableTerms(docnames = mydocs, dtm = nih_sample_dtm, p_terms = term_probs)

[Package textmineR version 3.0.5 Index]