sento_lexicons {sentometrics} | R Documentation |
Set up lexicons (and valence word list) for use in sentiment analysis
Description
Structures provided lexicon(s) and optionally valence words. One can for example combine (part of) the
built-in lexicons from data("list_lexicons")
with other lexicons, and add one of the built-in valence word lists
from data("list_valence_shifters")
. This function makes the output coherent, by converting all words to
lowercase and checking for duplicates. All entries consisting of more than one word are discarded, as required for
bag-of-words sentiment analysis.
Usage
sento_lexicons(lexiconsIn, valenceIn = NULL, do.split = FALSE)
Arguments
lexiconsIn |
a named |
valenceIn |
a single valence word list as a |
do.split |
a |
Value
A list
of class sento_lexicons
with each lexicon as a separate element according to its name, as a
data.table
, and optionally an element named valence
that comprises the valence words. Every "x"
column
contains the words, every "y"
column contains the scores. The "t"
column for valence shifters
contains the different types.
Author(s)
Samuel Borms
Examples
data("list_lexicons", package = "sentometrics")
data("list_valence_shifters", package = "sentometrics")
# lexicons straight from built-in word lists
l1 <- sento_lexicons(list_lexicons[c("LM_en", "HENRY_en")])
# including a self-made lexicon, with and without valence shifters
lexIn <- c(list(myLexicon = data.table::data.table(w = c("nice", "boring"), s = c(2, -1))),
list_lexicons[c("GI_en")])
valIn <- list_valence_shifters[["en"]]
l2 <- sento_lexicons(lexIn)
l3 <- sento_lexicons(lexIn, valIn)
l4 <- sento_lexicons(lexIn, valIn[, c("x", "y")], do.split = TRUE)
l5 <- sento_lexicons(lexIn, valIn[, c("x", "t")], do.split = TRUE)
l6 <- l5[c("GI_en_POS", "valence")] # preserves sento_lexicons class
## Not run:
# include lexicons from lexicon package
lexIn2 <- list(hul = lexicon::hash_sentiment_huliu, joc = lexicon::hash_sentiment_jockers)
l7 <- sento_lexicons(c(lexIn, lexIn2), valIn)
## End(Not run)
## Not run:
# faulty extraction, no replacement allowed
l5["valence"]
l2[0]
l3[22]
l4[1] <- l2[1]
l4[[1]] <- l2[[1]]
l4$GI_en_NEG <- l2$myLexicon
## End(Not run)