tm_clean {wpa} | R Documentation |
Clean subject line text prior to analysis
Description
This function processes the Subject
column in a Meeting Query by applying
tokenisation usingtidytext::unnest_tokens()
, and removing any stopwords
supplied in a data frame (using the argument stopwords
). This is a
sub-function that feeds into tm_freq()
, tm_cooc()
, and tm_wordcloud()
.
The default is to return a data frame with tokenised counts of words or
ngrams.
Usage
tm_clean(data, token = "words", stopwords = NULL, ...)
Arguments
data |
A Meeting Query dataset in the form of a data frame. |
token |
A character vector accepting either |
stopwords |
A character vector OR a single-column data frame labelled
|
... |
Additional parameters to pass to |
Value
data frame with two columns:
-
line
-
word
See Also
Other Text-mining:
meeting_tm_report()
,
pairwise_count()
,
subject_validate()
,
subject_validate_report()
,
tm_cooc()
,
tm_freq()
,
tm_wordcloud()
Examples
# words
tm_clean(mt_data)
# ngrams
tm_clean(mt_data, token = "ngrams")