transformer_scores {transforEmotion}R Documentation

Sentiment Analysis Scores

Description

Uses sentiment analysis pipelines from huggingface to compute probabilities that the text corresponds to the specified classes

Usage

transformer_scores(
  text,
  classes,
  multiple_classes = FALSE,
  transformer = c("cross-encoder-roberta", "cross-encoder-distilroberta",
    "facebook-bart"),
  preprocess = FALSE,
  keep_in_env = TRUE,
  envir = 1
)

Arguments

text

Character vector or list. Text in a vector or list data format

classes

Character vector. Classes to score the text

multiple_classes

Boolean. Whether the text can belong to multiple true classes. Defaults to FALSE. Set to TRUE to get scores with multiple classes

transformer

Character. Specific zero-shot sentiment analysis transformer to be used. Default options:

"cross-encoder-roberta"

Uses Cross-Encoder's Natural Language Interface RoBERTa Base zero-shot classification model trained on the Stanford Natural Language Inference (SNLI) corpus and MultiNLI datasets

"cross-encoder-distilroberta"

Uses Cross-Encoder's Natural Language Interface DistilRoBERTa Base zero-shot classification model trained on the Stanford Natural Language Inference (SNLI) corpus and MultiNLI datasets. The DistilRoBERTa is intended to be a smaller, more lightweight version of "cross-encoder-roberta", that sacrifices some accuracy for much faster speed (see https://www.sbert.net/docs/pretrained_cross-encoders.html#nli)

"facebook-bart"

Uses Facebook's BART Large zero-shot classification model trained on the Multi-Genre Natural Language Inference (MultiNLI) dataset

Defaults to "cross-encoder-distilroberta"

Also allows any zero-shot classification models with a pipeline from huggingface to be used by using the specified name (e.g., "typeform/distilbert-base-uncased-mnli"; see Examples)

preprocess

Boolean. Should basic preprocessing be applied? Includes making lowercase, keeping only alphanumeric characters, removing escape characters, removing repeated characters, and removing white space. Defaults to FALSE. Transformers generally are OK without preprocessing and handle many of these functions internally, so setting to TRUE will not change performance much

keep_in_env

Boolean. Whether the classifier should be kept in your global environment. Defaults to TRUE. By keeping the classifier in your environment, you can skip re-loading the classifier every time you run this function. TRUE is recommended

envir

Numeric. Environment for the classifier to be saved for repeated use. Defaults to the global environment

Value

Returns probabilities for the text classes

Author(s)

Alexander P. Christensen <alexpaulchristensen@gmail.com>

References

# BART
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., ... & Zettlemoyer, L. (2019). Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.

# RoBERTa
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.

# Zero-shot classification
Yin, W., Hay, J., & Roth, D. (2019). Benchmarking zero-shot text classification: Datasets, evaluation and entailment approach. arXiv preprint arXiv:1909.00161.

# MultiNLI dataset
Williams, A., Nangia, N., & Bowman, S. R. (2017). A broad-coverage challenge corpus for sentence understanding through inference. arXiv preprint arXiv:1704.05426.

Examples

# Load data
data(neo_ipip_extraversion)

# Example text 
text <- neo_ipip_extraversion$friendliness[1:5]

## Not run: 
# Cross-Encoder DistilRoBERTa
transformer_scores(
 text = text,
 classes = c(
   "friendly", "gregarious", "assertive",
   "active", "excitement", "cheerful"
 )
)

# Facebook BART Large
transformer_scores(
 text = text,
 classes = c(
   "friendly", "gregarious", "assertive",
   "active", "excitement", "cheerful"
 ),
 transformer = "facebook-bart"
)

# Directly from huggingface: typeform/distilbert-base-uncased-mnli
transformer_scores(
 text = text,
 classes = c(
   "friendly", "gregarious", "assertive",
   "active", "excitement", "cheerful"
 ),
 transformer = "typeform/distilbert-base-uncased-mnli"
)

## End(Not run)


[Package transforEmotion version 0.1.4 Index]