plotRemoved {stm} | R Documentation |
Plot documents, words and tokens removed at various word thresholds
Description
A plot function which shows the results of using different thresholds in
prepDocuments
on the size of the corpus.
Usage
plotRemoved(documents, lower.thresh)
Arguments
documents |
The documents to be used for the stm model |
lower.thresh |
A vector of integers, each of which will be tested as a lower threshold for the prepDocuments function. |
Details
For a lower threshold, prepDocuments
will drop words which appear in
fewer than that number of documents, and remove documents which contain no
more words. This function allows the user to pass a vector of lower
thresholds and observe how prepDocuments
will handle each threshold.
This function produces three plots, showing the number of words, the number
of documents, and the total number of tokens removed as a function of
threshold values. A dashed red line is plotted at the total number of
documents, words and tokens respectively.
Value
Invisibly returns a list of
lower.thresh |
The sorted threshold values |
ndocs |
The number of documents dropped for each value of the lower threshold |
nwords |
The number of entries of the vocab dropped for each value of the lower threshold. |
ntokens |
The number of tokens dropped for each value of the lower threshold. |
See Also
Examples
plotRemoved(poliblog5k.docs, lower.thresh=seq(from = 10, to = 1000, by = 10))