R: TF-IDF

term_tfidf {creditmodel}

R Documentation

TF-IDF

Description

The term_filter is for filtering stop_words and low frequency words. The term_idf is for computing idf(inverse documents frequency) of terms. The term_tfidf is for computing tf-idf of documents.

Usage

term_tfidf(term_df, idf = NULL)

term_idf(term_df, n_total = NULL)

term_filter(term_df, low_freq = 0.01, stop_words = NULL)

Arguments

`term_df`	A data.frame with id and term.
`idf`	A data.frame with idf.
`n_total`	Number of documents.
`low_freq`	Use rate of terms or use numbers of terms.
`stop_words`	Stop words.

Value

A data.frame

Examples

term_df = data.frame(id = c(1,1,1,2,2,3,3,3,4,4,4,4,4,5,5,6,7,7,
                            8,8,8,9,9,9,10,10,11,11,11,11,11,11),
terms = c('a','b','c','a','c','d','d','a','b','c','a','c','d','a','c',
          'd','a','e','f','b','c','f','b','c','h','h','i','c','d','g','k','k'))
term_df = term_filter(term_df = term_df, low_freq = 1)
idf = term_idf(term_df)
tf_idf = term_tfidf(term_df,idf = idf)

[Package creditmodel version 1.3.1 Index]