add_features {sentometrics} | R Documentation |
Add feature columns to a (sento_)corpus object
Description
Adds new feature columns, either user-supplied or based on keyword(s)/regex pattern search, to
a provided sento_corpus
or a quanteda corpus
object.
Usage
add_features(
corpus,
featuresdf = NULL,
keywords = NULL,
do.binary = TRUE,
do.regex = FALSE
)
Arguments
corpus |
a |
featuresdf |
a named |
keywords |
a named |
do.binary |
a |
do.regex |
a |
Details
If a provided feature name is already part of the corpus, it will be replaced. The featuresdf
and
keywords
arguments can be provided at the same time, or only one of them, leaving the other at NULL
. We use
the stringi package for searching the keywords. The do.regex
argument points to the corresponding elements
in keywords
. For FALSE
, we transform the keywords into a simple regex expression, involving "\b"
for
exact word boundary matching and (if multiple keywords) |
as OR operator. The elements associated to TRUE
do
not undergo this transformation, and are evaluated as given, if the corresponding keywords vector consists of only one
expression. For a large corpus and/or complex regex patterns, this function may require some patience. Scaling between 0
and 1 is performed via min-max normalization, per column.
Value
An updated corpus
object.
Author(s)
Samuel Borms
Examples
set.seed(505)
# construct a corpus and add (a) feature(s) to it
corpus <- quanteda::corpus_sample(
sento_corpus(corpusdf = sentometrics::usnews), 500
)
corpus1 <- add_features(corpus,
featuresdf = data.frame(random = runif(quanteda::ndoc(corpus))))
corpus2 <- add_features(corpus,
keywords = list(pres = "president", war = "war"),
do.binary = FALSE)
corpus3 <- add_features(corpus,
keywords = list(pres = c("Obama", "US president")))
corpus4 <- add_features(corpus,
featuresdf = data.frame(all = 1),
keywords = list(pres1 = "Obama|US [p|P]resident",
pres2 = "\\bObama\\b|\\bUS president\\b",
war = "war"),
do.regex = c(TRUE, TRUE, FALSE))
sum(quanteda::docvars(corpus3, "pres")) ==
sum(quanteda::docvars(corpus4, "pres2")) # TRUE
# adding a complementary feature
nonpres <- data.frame(nonpres = as.numeric(!quanteda::docvars(corpus3, "pres")))
corpus3 <- add_features(corpus3, featuresdf = nonpres)