R: Validate Sentiment Score Sign Against Known Results

validate_sentiment {sentimentr}

R Documentation

Validate Sentiment Score Sign Against Known Results

Description

Provides a multiclass macroaverage/microaverage of precision, recall, accuracy, and F-score for the sign of the predicted sentiment against known sentiment scores. There are three classes sentiment analysis generally predicts: positive (> 0), negative (< 0) and neutral (= 0). In assessing model performance one can use macro- or micro- averaging across classes. Macroaveraging allows every class to have an equal say. Microaveraging gives larger say to larger classes.

Usage

validate_sentiment(predicted, actual, ...)

Arguments

`predicted`	A numeric vector of predicted sentiment scores or a sentimentr object that returns sentiment scores.
`actual`	A numeric vector of known sentiment ratings.
`...`	ignored.

Value

Returns a data.frame with a macroaveraged and microaveraged model validation scores. Additionally, the data.frame has the following attributes:

`confusion_matrix`	A confusion matrix of all classes
`class_confusion_matrices`	A `list` of class level (class vs. all) confusion matrices
`macro_stats`	A `data.frame` of the macroaverged class level stats before averaging
`mda`	Mean Directional Accuracy
`mare`	Mean Absolute Rescaled Error

Note

Mean Absolute Rescaled Error (MARE) is defined as: \frac{\sum{|actual - predicted|}}{2n} and gives a sense of, on average, how far off were the rescaled predicted values (-1 to 1) from the rescaled actual values (-1 to 1). A value of 0 means perfect accuracy. A value of 1 means perfectly wrong every time. A value of .5 represents expected value for random guessing. This measure is related to Mean Absolute Error.

References

https://www.youtube.com/watch?v=OwwdYHWRB5E&index=31&list=PL6397E4B26D00A269
https://en.wikipedia.org/wiki/Mean_Directional_Accuracy_(MDA)

Examples

actual <- c(1, 1, 1, 1, -1, -1, -1, -1, -1, -1, -1, 1,-1)
predicted <- c(1, 0, 1, -1, 1, 0, -1, -1, -1, -1, 0, 1,-1)
validate_sentiment(predicted, actual)

scores <- hu_liu_cannon_reviews$sentiment
mod <- sentiment_by(get_sentences(hu_liu_cannon_reviews$text))

validate_sentiment(mod$ave_sentiment, scores)
validate_sentiment(mod, scores)

x <- validate_sentiment(mod, scores)
attributes(x)$confusion_matrix
attributes(x)$class_confusion_matrices
attributes(x)$macro_stats

## Annie Swafford Example
swafford <- data.frame(
    text = c(
        "I haven't been sad in a long time.",
        "I am extremely happy today.",
        "It's a good day.",
        "But suddenly I'm only a little bit happy.",
        "Then I'm not happy at all.",
        "In fact, I am now the least happy person on the planet.",
        "There is no happiness left in me.",
        "Wait, it's returned!",
        "I don't feel so bad after all!"
    ), 
    actual = c(.8, 1, .8, -.1, -.5, -1, -1, .5, .6), 
    stringsAsFactors = FALSE
)

pred <- sentiment_by(swafford$text) 
validate_sentiment(
    pred,
    actual = swafford$actual
)

[Package sentimentr version 2.9.0 Index]