R: Clean subject line text prior to analysis

tm_clean {wpa}

R Documentation

Clean subject line text prior to analysis

Description

This function processes the Subject column in a Meeting Query by applying tokenisation usingtidytext::unnest_tokens(), and removing any stopwords supplied in a data frame (using the argument stopwords). This is a sub-function that feeds into tm_freq(), tm_cooc(), and tm_wordcloud(). The default is to return a data frame with tokenised counts of words or ngrams.

Usage

tm_clean(data, token = "words", stopwords = NULL, ...)

Arguments

`data`	A Meeting Query dataset in the form of a data frame.
`token`	A character vector accepting either `"words"` or `"ngrams"`, determining type of tokenisation to return.
`stopwords`	A character vector OR a single-column data frame labelled `'word'` containing custom stopwords to remove.
`...`	Additional parameters to pass to `tidytext::unnest_tokens()`.

Value

data frame with two columns:

line
word

Examples

# words
tm_clean(mt_data)

# ngrams
tm_clean(mt_data, token = "ngrams")