R: Perform dictionary-based sentiment analysis on a tokenised...

txt_sentiment {udpipe}

R Documentation

Perform dictionary-based sentiment analysis on a tokenised data frame

Description

This function identifies words which have a positive/negative meaning, with the addition of some basic logic regarding occurrences of amplifiers/deamplifiers and negators in the neighbourhood of the word which has a positive/negative meaning.

If a negator is occurring in the neigbourhood, positive becomes negative or vice versa.
If amplifiers/deamplifiers occur in the neigbourhood, these amplifier weight is added to the sentiment polarity score.

This function took inspiration from qdap::polarity but was completely re-engineered to allow to calculate similar things on a udpipe-tokenised dataset. It works on a sentence level and the negator/amplification logic can not surpass a boundary defined by the PUNCT upos parts of speech tag.

Note that if you prefer to build a supervised model to perform sentiment scoring you might be interested in looking at the ruimtehol R package https://github.com/bnosac/ruimtehol instead.

Usage

txt_sentiment(
  x,
  term = "lemma",
  polarity_terms,
  polarity_negators = character(),
  polarity_amplifiers = character(),
  polarity_deamplifiers = character(),
  amplifier_weight = 0.8,
  n_before = 4,
  n_after = 2,
  constrain = FALSE
)

Arguments

`x`	a data.frame with the columns doc_id, paragraph_id, sentence_id, upos and the column as indicated in `term`. This is exactly what `udpipe` returns.
`term`	a character string with the name of a column of `x` where you want to apply to sentiment scoring upon
`polarity_terms`	data.frame containing terms which have positive or negative meaning. This data frame should contain the columns term and polarity where term is of type character and polarity can either be 1 or -1.
`polarity_negators`	a character vector of words which will invert the meaning of the `polarity_terms` such that -1 becomes 1 and vice versa
`polarity_amplifiers`	a character vector of words which amplify the `polarity_terms`
`polarity_deamplifiers`	a character vector of words which deamplify the `polarity_terms`
`amplifier_weight`	weight which is added to the polarity score if an amplifier occurs in the neighbourhood
`n_before`	integer indicating how many words before the `polarity_terms` word one has to look to find negators/amplifiers/deamplifiers to apply its logic
`n_after`	integer indicating how many words after the `polarity_terms` word one has to look to find negators/amplifiers/deamplifiers to apply its logic
`constrain`	logical indicating to make sure the aggregated sentiment scores is between -1 and 1

Value

a list containing

data: the x data.frame with 2 columns added: polarity and sentiment_polarity.
- The column polarity being just the polarity column of the polarity_terms dataset corresponding to the polarity of the term you apply the sentiment scoring
- The colummn sentiment_polarity is the value where the amplifier/de-amplifier/negator logic is applied on.
overall: a data.frame with one row per doc_id containing the columns doc_id, sentences, terms, sentiment_polarity, terms_positive, terms_negative, terms_negation and terms_amplification providing the aggregate sentiment_polarity score of the dataset x by doc_id as well as the terminology causing the sentiment, the number of sentences and the number of non punctuation terms in the document.

Examples

x <- c("I do not like whatsoever when an R package has soo many dependencies.",
       "Making other people install java is annoying, 
        as it is a really painful experience in classrooms.")
## Not run: 
## Do the annotation to get the data.frame needed as input to txt_sentiment
anno <- udpipe(x, "english-gum")

## End(Not run)
anno <- data.frame(doc_id = c(rep("doc1", 14), rep("doc2", 18)), 
                   paragraph_id = 1,
                   sentence_id = 1,
                   lemma = c("I", "do", "not", "like", "whatsoever", 
                             "when", "an", "R", "package", 
                             "has", "soo", "many", "dependencies", ".", 
                             "Making", "other", "people", "install", 
                             "java", "is", "annoying", ",", "as", 
                             "it", "is", "a", "really", "painful", 
                             "experience", "in", "classrooms", "."),
                   upos = c("PRON", "AUX", "PART", "VERB", "PRON", 
                            "SCONJ", "DET", "PROPN", "NOUN", "VERB", 
                             "ADV", "ADJ", "NOUN", "PUNCT", 
                             "VERB", "ADJ", "NOUN", "ADJ", "NOUN", 
                             "AUX", "VERB", "PUNCT", "SCONJ", "PRON", 
                             "AUX", "DET", "ADV", "ADJ", "NOUN", 
                             "ADP", "NOUN", "PUNCT"),
                   stringsasFactors = FALSE)
scores <- txt_sentiment(x = anno, 
              term = "lemma",
              polarity_terms = data.frame(term = c("annoy", "like", "painful"), 
                                          polarity = c(-1, 1, -1)), 
              polarity_negators = c("not", "neither"),
              polarity_amplifiers = c("pretty", "many", "really", "whatsoever"), 
              polarity_deamplifiers = c("slightly", "somewhat"))
scores$overall
scores$data
scores <- txt_sentiment(x = anno, 
              term = "lemma",
              polarity_terms = data.frame(term = c("annoy", "like", "painful"), 
                                          polarity = c(-1, 1, -1)), 
              polarity_negators = c("not", "neither"),
              polarity_amplifiers = c("pretty", "many", "really", "whatsoever"), 
              polarity_deamplifiers = c("slightly", "somewhat"),
              constrain = TRUE, n_before = 4,
              n_after = 2, amplifier_weight = .8)
scores$overall
scores$data

[Package udpipe version 0.8.11 Index]