bind_tf_idf2 {audubon}R Documentation

Bind term frequency and inverse document frequency

Description

Calculates and binds the term frequency, inverse document frequency, and TF-IDF of the dataset. This function experimentally supports 3 types of term frequencies and 4 types of inverse document frequencies, which are implemented in 'RMeCab' package.

Usage

bind_tf_idf2(
  tbl,
  term = "token",
  document = "doc_id",
  n = "n",
  tf = c("tf", "tf2", "tf3"),
  idf = c("idf", "idf2", "idf3", "idf4"),
  norm = FALSE,
  rmecab_compat = TRUE
)

Arguments

tbl

A tidy text dataset.

term

Column containing terms as string or symbol.

document

Column containing document IDs as string or symbol.

n

Column containing document-term counts as string or symbol.

tf

Method for computing term frequency.

idf

Method for computing inverse document frequency.

norm

Logical; If passed as TRUE, the raw term counts are normalized being divided with L2 norms before computing IDF values.

rmecab_compat

Logical; If passed as TRUE, computes values while taking care of compatibility with 'RMeCab'. Note that 'RMeCab' always computes IDF values using term frequency rather than raw term counts, and thus TF-IDF values may be doubly affected by term frequency.

Details

Types of term frequency can be switched with tf argument:

Types of inverse document frequencies can be switched with idf argument:

Value

A data.frame.

Examples

## Not run: 
df <- dplyr::add_count(hiroba, doc_id, token)
bind_tf_idf2(df)

## End(Not run)

[Package audubon version 0.5.1 Index]