profanity {sentimentr} | R Documentation |
Compute Profanity Rate
Description
Detect the rate of profanity at the sentence level. This method uses a simple
dictionary lookup to find profane words and then compute the rate per sentence.
The profanity
score ranges between 0 (no profanity used) and 1 (all
words used were profane). Note that a single profane phrase would count as
just one in the profanity_count
column but would count as two words in
the word_count
column.
Usage
profanity(
text.var,
profanity_list = unique(tolower(lexicon::profanity_alvarez)),
...
)
Arguments
text.var |
The text variable. Can be a |
profanity_list |
A atomic character vector of profane words. The lexicon package has lists that can be used, including:
|
... |
ignored. |
Value
Returns a data.table of:
element_id - The id number of the original vector passed to
profanity
sentence_id - The id number of the sentences within each
element_id
word_count - Word count
profanity_count - Count of the number of profane words
profanity - A score of the percentage of profane words
See Also
Other profanity functions:
profanity_by()
Examples
## Not run:
bw <- sample(unique(tolower(lexicon::profanity_alvarez)), 4)
mytext <- c(
sprintf('do you like this %s? It is %s. But I hate really bad dogs', bw[1], bw[2]),
'I am the best friend.',
NA,
sprintf('I %s hate this %s', bw[3], bw[4]),
"Do you really like it? I'm not happy"
)
## works on a character vector but not the preferred method avoiding the
## repeated cost of doing sentence boundary disambiguation every time
## `profanity` is run
profanity(mytext)
## preferred method avoiding paying the cost
mytext2 <- get_sentences(mytext)
profanity(mytext2)
plot(profanity(mytext2))
brady <- get_sentences(crowdflower_deflategate)
brady_swears <- profanity(brady)
brady_swears
## Distribution of profanity proportion for all comments
hist(brady_swears$profanity)
sum(brady_swears$profanity > 0)
## Distribution of proportions for those profane comments
hist(brady_swears$profanity[brady_swears$profanity > 0])
combo <- combine_data()
combo_sentences <- get_sentences(crowdflower_deflategate)
racist <- profanity(combo_sentences, profanity_list = lexicon::profanity_racist)
combo_sentences[racist$profanity > 0, ]$text
extract_profanity_terms(
combo_sentences[racist$profanity > 0, ]$text,
profanity_list = lexicon::profanity_racist
)
## Remove jerry, que, and illegal from the list
library(textclean)
racist2 <- profanity(
combo_sentences,
profanity_list = textclean::drop_element_fixed(
lexicon::profanity_racist,
c('jerry', 'illegal', 'que')
)
)
combo_sentences[racist2$profanity > 0, ]$text
## End(Not run)