R: Text Analysis with Rules and Dictionaries for Inferring...

tardis {tardis}

R Documentation

Text Analysis with Rules and Dictionaries for Inferring Sentiment (TARDIS)

Description

This function uses dictionaries (either the included defaults or user-supplied) custom dictionaries) and simple rules to measure the sentiment of supplied text. "Sentiment" means roughly the emotion expressed in the text, where emotions are collapsed into positive (e.g. happy) or negative (e.g. sad, angry).

Usage

tardis(
  input_text = c("I am happy.", "I am VERY happy!!", ":)", "Not sad.", "Bad.",
    "Not bad.", "A happy sentence! And a sad one. In the same text."),
  text_column = NA,
  dict_sentiments = NA,
  dict_modifiers = NA,
  dict_negations = NA,
  sigmoid_factor = 15,
  negation_factor = 0.75,
  allcaps_factor = 1.25,
  punctuation_factor = 1.15,
  use_punctuation = TRUE,
  summary_function = c("mean", "median", "max", "min", "sum"),
  simple_count = FALSE,
  verbose = FALSE
)

Arguments

`input_text`	Text to analyze, either a character vector or a data.frame with a column of text.
`text_column`	If using data.frame input, the name of the column of text to analyze.
`dict_sentiments`	Optional sentiment dictionary, defaults to internal tardis dictionary. A data.frame with two columns: `word` and `value`.
`dict_modifiers`	Optional modifiers dictionary, or "none" to disable modifiers. Defaults to internal tardis dictionary. A data.frame with two columns: `word` and `value`.
`dict_negations`	Optional negation dictionary, or "none" to disable negations. Defaults to internal tardis dictionary. A data.frame with one column: `word`.
`sigmoid_factor`	Numeric, default 15. Factor for scaling sentence scores to -1/+1 using a sigmoid function. Set to NA to disable the sigmoid function and just return sums of scores, adjusted by any applicable negators, modifiers, or punctuation/caps effects.
`negation_factor`	Numeric, default 0.75. Multiplier for damping effects of sentiment-bearing terms after negations. Stacks multiplicatively. Should probably be less than 1.
`allcaps_factor`	Numeric, default 1.25. Multiplier for scaling effects of of sentiment-bearing terms in ALL CAPS. Should probably be more than 1, to increase effects.
`punctuation_factor`	Numeric, default 1.15. Multiplier for scaling effects of punctuation. A single question mark has no effect, but one or more exclamation marks does, and question marks have effects in the presence of exclamation marks, up to three punctuation marks total.
`use_punctuation`	Boolean, default TRUE. Should we consider sentence-level punctuation?
`summary_function`	For multi-sentence texts, how should we summarise sentence scores into a text score? Default "mean", also accepts "median", "max", "min", and "sum".
`simple_count`	Boolean, default FALSE. Convenience parameter that overrides many other parameters to enable simple counts of dictionary words: no modifiers, negations, capitalization, or punctuation effects are considered and no sigmoid function is applied.
`verbose`	For debugging–should it print lots of messages to the console?

Details

Roughly, each word's sentiment is a property of its dictionary-given sentiment, whether it's written in all-caps or not, and the three preceding words. A preceding negation (e.g. "not") will reverse and reduce the sentiment–turning a positive into a slightly less extreme negative, or vice-versa–and a preceding modifier can either increase/decrease the sentiment (e.g. "very" will increase it, "somewhat" will decrease it).

Sentences are scored based on their words and the presence of exclamation or question marks.

If a supplied text string has more than one sentence, this function will also return the mean, standard deviation, and range of sentiments expressed in its sentences. The rationale is that it doesn't make sense to apply sentence-level analysis to paragraphs, especially for online communications where people can use quick swings in sentiment to express irony.

Input can be supplied in a data.frame or character vector.

Value

A tbl_df with one row for each input text and three new columns: sentiment_mean: the average sentiment for each sentence in each text. sentiment_sd: the standard deviation of sentence sentiments for each text. sentiment_range: the range of sentence sentiments for each text.

[Package tardis version 0.1.4 Index]