sort_tf {chinese.misc} | R Documentation |
Find High Frequency Terms
Description
By inputting a matrix, or a document term matrix, or term document matrix, this function counts the sum of each term and output top n terms. The result can be messaged on the screen, so that you can manually copy them to other places (e. g., Excel).
Usage
sort_tf(x, top = 10, type = "dtm", todf = FALSE, must_exact = FALSE)
Arguments
x |
a matrix, or an object created by |
top |
a length 1 integer. As terms are in the decreasing
order of the term frequency, this argument decides how many top terms should be returned.
The default is 10. If the number of terms is smaller than |
type |
should start with "D/d" representing document term matrix,
or "T/t" representing term document matrix.
It is only used when |
todf |
should be |
must_exact |
should be |
Details
Sometimes you may pick more terms than specified by top
. For example, you specify to
pick up the top 5 terms, and the frequency of the 5th term is 20. But in fact there are
two more terms that
have frequency of 20. As a result, sort_tf
may pick up 7 terms. If you want the
number is exactly 5, set must_exact
to TRUE
.
Value
return nothing and message the result, or return a data frame.
Examples
require(tm)
x <- c(
"Hello, what do you want to drink?",
"drink a bottle of milk",
"drink a cup of coffee",
"drink some water",
"hello, drink a cup of coffee")
dtm <- corp_or_dtm(x, from = "v", type = "dtm")
# Argument top is 5, but more than 5 terms are returned
sort_tf(dtm, top = 5)
# Set must_exact to TRUE, return exactly 5 terms
sort_tf(dtm, top=5, must_exact=TRUE)
# Input is a matrix and terms are not specified
m=as.matrix(dtm)
colnames(m)=NULL
mt=t(m)
sort_tf(mt, top=5, type="tdm")