weightTfIdf {tm} | R Documentation |
Weight by Term Frequency - Inverse Document Frequency
Description
Weight a term-document matrix by term frequency - inverse document frequency.
Usage
weightTfIdf(m, normalize = TRUE)
Arguments
m |
A |
normalize |
A Boolean value indicating whether the term frequencies should be normalized. |
Details
Formally this function is of class WeightingFunction
with the
additional attributes name
and acronym
.
Term frequency counts the number of
occurrences
of a term
in a document
. In the case of normalization, the term frequency
is divided by
.
Inverse document frequency for a term is defined as
where
denotes the total number of documents and where
is the number of documents where the term
appears.
Term frequency - inverse document frequency is now defined as
.
Value
The weighted matrix.
References
Gerard Salton and Christopher Buckley (1988). Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24/5, 513–523.