tfidf {textir} | R Documentation |
tf-idf
Description
term frequency, inverse document frequency
Usage
tfidf(x,normalize=TRUE)
Arguments
x |
A |
normalize |
Whether to normalize term frequency by document totals. |
Value
A matrix of the same type as x
, with values replaced by the tf-idf
f_{ij} * \log[n/(d_j+1)],
where f_{ij}
is x_{ij}/m_i
or x_{ij}
, depending on normalize
,
and d_j
is the number of documents containing token j
.
Author(s)
Matt Taddy taddy@chicagobooth.edu
See Also
pls, we8there
Examples
data(we8there)
## 20 high-variance tf-idf terms
colnames(we8thereCounts)[
order(-sdev(tfidf(we8thereCounts)))[1:20]]
[Package textir version 2.0-5 Index]