profanity_by {sentimentr} | R Documentation |
Profanity Rate By Groups
Description
Approximate the profanity of text by grouping variable(s). For a
full description of the profanity detection algorithm see
profanity
. See profanity
for more details about the algorithm, the profanity/valence shifter keys
that can be passed into the function, and other arguments that can be passed.
Usage
profanity_by(text.var, by = NULL, group.names, ...)
Arguments
text.var |
The text variable. Also takes a |
by |
The grouping variable(s). Default |
group.names |
A vector of names that corresponds to group. Generally for internal use. |
... |
Other arguments passed to |
Value
Returns a data.table with grouping variables plus:
element_id - The id number of the original vector passed to
profanity
sentence_id - The id number of the sentences within each
element_id
word_count - Word count
sum
med by grouping variableprofanity_count - The number of profanities used by grouping variable
sd - Standard deviation (
sd
) of the sentence level profanity rate by grouping variableave_profanity - Profanity rate
Chaining
See the sentiment_by
for details about sentimentr chaining.
See Also
Other profanity functions:
profanity()
Examples
## Not run:
bw <- sample(lexicon::profanity_alvarez, 4)
mytext <- c(
sprintf('do you like this %s? It is %s. But I hate really bad dogs', bw[1], bw[2]),
'I am the best friend.',
NA,
sprintf('I %s hate this %s', bw[3], bw[4]),
"Do you really like it? I'm not happy"
)
## works on a character vector but not the preferred method avoiding the
## repeated cost of doing sentence boundary disambiguation every time
## `profanity` is run
profanity(mytext)
profanity_by(mytext)
## preferred method avoiding paying the cost
mytext <- get_sentences(mytext)
profanity_by(mytext)
get_sentences(profanity_by(mytext))
(myprofanity <- profanity_by(mytext))
stats::setNames(get_sentences(profanity_by(mytext)),
round(myprofanity[["ave_profanity"]], 3))
brady <- get_sentences(crowdflower_deflategate)
library(data.table)
bp <- profanity_by(brady)
crowdflower_deflategate[bp[ave_profanity > 0,]$element_id, ]
vulgars <- bp[["ave_profanity"]] > 0
stats::setNames(get_sentences(bp)[vulgars],
round(bp[["ave_profanity"]][vulgars], 3))
bt <- data.table(crowdflower_deflategate)[,
source := ifelse(grepl('^RT', text), 'retweet', 'OP')][,
belichick := grepl('\\bb[A-Za-z]+l[A-Za-z]*ch', text, ignore.case = TRUE)][]
prof_bel <- with(bt, profanity_by(text, by = list(source, belichick)))
plot(prof_bel)
## End(Not run)