| tm_clean {wpa} | R Documentation |
Clean subject line text prior to analysis
Description
This function processes the Subject column in a Meeting Query by applying
tokenisation usingtidytext::unnest_tokens(), and removing any stopwords
supplied in a data frame (using the argument stopwords). This is a
sub-function that feeds into tm_freq(), tm_cooc(), and tm_wordcloud().
The default is to return a data frame with tokenised counts of words or
ngrams.
Usage
tm_clean(data, token = "words", stopwords = NULL, ...)
Arguments
data |
A Meeting Query dataset in the form of a data frame. |
token |
A character vector accepting either |
stopwords |
A character vector OR a single-column data frame labelled
|
... |
Additional parameters to pass to |
Value
data frame with two columns:
-
line -
word
See Also
Other Text-mining:
meeting_tm_report(),
pairwise_count(),
subject_validate(),
subject_validate_report(),
tm_cooc(),
tm_freq(),
tm_wordcloud()
Examples
# words
tm_clean(mt_data)
# ngrams
tm_clean(mt_data, token = "ngrams")