bind_tf_idf {tidytext} | R Documentation |
Bind the term frequency and inverse document frequency of a tidy text dataset to the dataset
Description
Calculate and bind the term frequency and inverse document frequency of a tidy text dataset, along with the product, tf-idf, to the dataset. Each of these values are added as columns. This function supports non-standard evaluation through the tidyeval framework.
Usage
bind_tf_idf(tbl, term, document, n)
Arguments
tbl |
A tidy text dataset with one-row-per-term-per-document |
term |
Column containing terms as string or symbol |
document |
Column containing document IDs as string or symbol |
n |
Column containing document-term counts as string or symbol |
Details
The arguments term
, document
, and n
are passed by expression and support quasiquotation;
you can unquote strings and symbols.
If the dataset is grouped, the groups are ignored but are retained.
The dataset must have exactly one row per document-term combination for this to work.
Examples
library(dplyr)
library(janeaustenr)
book_words <- austen_books() %>%
unnest_tokens(word, text) %>%
count(book, word, sort = TRUE)
book_words
# find the words most distinctive to each document
book_words %>%
bind_tf_idf(word, book, n) %>%
arrange(desc(tf_idf))
[Package tidytext version 0.4.2 Index]