tfidf {textir}R Documentation

tf-idf

Description

term frequency, inverse document frequency

Usage

tfidf(x,normalize=TRUE)

Arguments

x

A dgCMatrix or matrix of counts.

normalize

Whether to normalize term frequency by document totals.

Value

A matrix of the same type as x, with values replaced by the tf-idf

fijlog[n/(dj+1)], f_{ij} * \log[n/(d_j+1)],

where fijf_{ij} is xij/mix_{ij}/m_i or xijx_{ij}, depending on normalize, and djd_j is the number of documents containing token jj.

Author(s)

Matt Taddy taddy@chicagobooth.edu

See Also

pls, we8there

Examples

data(we8there)
## 20 high-variance tf-idf terms
colnames(we8thereCounts)[
	order(-sdev(tfidf(we8thereCounts)))[1:20]]
 
 

[Package textir version 2.0-5 Index]