txt_sentiment {udpipe} | R Documentation |
Perform dictionary-based sentiment analysis on a tokenised data frame
Description
This function identifies words which have a positive/negative meaning, with the addition of some basic logic regarding occurrences of amplifiers/deamplifiers and negators in the neighbourhood of the word which has a positive/negative meaning.
If a negator is occurring in the neigbourhood, positive becomes negative or vice versa.
If amplifiers/deamplifiers occur in the neigbourhood, these amplifier weight is added to the sentiment polarity score.
This function took inspiration from qdap::polarity but was completely re-engineered to allow to calculate similar things on
a udpipe-tokenised dataset. It works on a sentence level and the negator/amplification logic can not surpass a boundary defined
by the PUNCT upos parts of speech tag.
Note that if you prefer to build a supervised model to perform sentiment scoring you might be interested in looking at the ruimtehol R package https://github.com/bnosac/ruimtehol instead.
Usage
txt_sentiment(
x,
term = "lemma",
polarity_terms,
polarity_negators = character(),
polarity_amplifiers = character(),
polarity_deamplifiers = character(),
amplifier_weight = 0.8,
n_before = 4,
n_after = 2,
constrain = FALSE
)
Arguments
x |
a data.frame with the columns doc_id, paragraph_id, sentence_id, upos and the column as indicated in |
term |
a character string with the name of a column of |
polarity_terms |
data.frame containing terms which have positive or negative meaning. This data frame should contain the columns term and polarity where term is of type character and polarity can either be 1 or -1. |
polarity_negators |
a character vector of words which will invert the meaning of the |
polarity_amplifiers |
a character vector of words which amplify the |
polarity_deamplifiers |
a character vector of words which deamplify the |
amplifier_weight |
weight which is added to the polarity score if an amplifier occurs in the neighbourhood |
n_before |
integer indicating how many words before the |
n_after |
integer indicating how many words after the |
constrain |
logical indicating to make sure the aggregated sentiment scores is between -1 and 1 |
Value
a list containing
data: the
x
data.frame with 2 columns added: polarity and sentiment_polarity.The column polarity being just the polarity column of the
polarity_terms
dataset corresponding to the polarity of theterm
you apply the sentiment scoringThe colummn sentiment_polarity is the value where the amplifier/de-amplifier/negator logic is applied on.
overall: a data.frame with one row per doc_id containing the columns doc_id, sentences, terms, sentiment_polarity, terms_positive, terms_negative, terms_negation and terms_amplification providing the aggregate sentiment_polarity score of the dataset
x
by doc_id as well as the terminology causing the sentiment, the number of sentences and the number of non punctuation terms in the document.
Examples
x <- c("I do not like whatsoever when an R package has soo many dependencies.",
"Making other people install java is annoying,
as it is a really painful experience in classrooms.")
## Not run:
## Do the annotation to get the data.frame needed as input to txt_sentiment
anno <- udpipe(x, "english-gum")
## End(Not run)
anno <- data.frame(doc_id = c(rep("doc1", 14), rep("doc2", 18)),
paragraph_id = 1,
sentence_id = 1,
lemma = c("I", "do", "not", "like", "whatsoever",
"when", "an", "R", "package",
"has", "soo", "many", "dependencies", ".",
"Making", "other", "people", "install",
"java", "is", "annoying", ",", "as",
"it", "is", "a", "really", "painful",
"experience", "in", "classrooms", "."),
upos = c("PRON", "AUX", "PART", "VERB", "PRON",
"SCONJ", "DET", "PROPN", "NOUN", "VERB",
"ADV", "ADJ", "NOUN", "PUNCT",
"VERB", "ADJ", "NOUN", "ADJ", "NOUN",
"AUX", "VERB", "PUNCT", "SCONJ", "PRON",
"AUX", "DET", "ADV", "ADJ", "NOUN",
"ADP", "NOUN", "PUNCT"),
stringsasFactors = FALSE)
scores <- txt_sentiment(x = anno,
term = "lemma",
polarity_terms = data.frame(term = c("annoy", "like", "painful"),
polarity = c(-1, 1, -1)),
polarity_negators = c("not", "neither"),
polarity_amplifiers = c("pretty", "many", "really", "whatsoever"),
polarity_deamplifiers = c("slightly", "somewhat"))
scores$overall
scores$data
scores <- txt_sentiment(x = anno,
term = "lemma",
polarity_terms = data.frame(term = c("annoy", "like", "painful"),
polarity = c(-1, 1, -1)),
polarity_negators = c("not", "neither"),
polarity_amplifiers = c("pretty", "many", "really", "whatsoever"),
polarity_deamplifiers = c("slightly", "somewhat"),
constrain = TRUE, n_before = 4,
n_after = 2, amplifier_weight = .8)
scores$overall
scores$data