tfidf {DramaAnalysis} | R Documentation |
TF-IDF
Description
This function calculates a variant of TF-IDF.
The input is assumed to contain relative frequencies.
IDF is calculated as follows: idf_t = \log\frac{N+1}{n_t}
, with N
being
the total number of documents (i.e., rows) and n_t
the number of documents
containing term t
. We add one to the denominator to prevent terms that appear
in every document to become 0.
Usage
tfidf(ftable)
Arguments
ftable |
A matrix, containing "documents" as rows and "terms" as columns. Values are assumed to be normalized by document, i.e., contain relative frequencies. |
Value
A matrix containing TF*IDF values instead of relative frequencies.
Examples
data(rksp.0)
ftable <- frequencytable(rksp.0, byCharacter=TRUE, normalize=TRUE)
rksp.0.tfidf <- tfidf(ftable)
mat <- matrix(c(0.10,0.2, 0,
0, 0.2, 0,
0.1, 0.2, 0.1,
0.8, 0.4, 0.9),
nrow=3,ncol=4)
mat2 <- tfidf(mat)
print(mat2)
[Package DramaAnalysis version 3.0.2 Index]