lma_termcat {lingmatch} | R Documentation |
Document-Term Matrix Categorization
Description
Reduces the dimensions of a document-term matrix by dictionary-based categorization.
Usage
lma_termcat(dtm, dict, term.weights = NULL, bias = NULL,
bias.name = "_intercept", escape = TRUE, partial = FALSE,
glob = TRUE, term.filter = NULL, term.break = 20000,
to.lower = FALSE, dir = getOption("lingmatch.dict.dir"),
coverage = FALSE)
Arguments
dtm |
A matrix with terms as column names. |
dict |
The name of a provided dictionary
(osf.io/y6g5b/wiki) or of a file found in
|
term.weights |
A |
bias |
A list or named vector specifying a constant to add to the named category. If a term
matching |
bias.name |
A character specifying a term to be used as a category bias; default is
|
escape |
Logical indicating whether the terms in |
partial |
Logical; if |
glob |
Logical; if |
term.filter |
A regular expression string used to format the text of each term (passed to
|
term.break |
If a category has more than |
to.lower |
Logical; if |
dir |
Path to a folder in which to look for |
coverage |
Logical; if |
Value
A matrix with a row per dtm
row and columns per dictionary category
(with added coverage_
versions if coverage
is TRUE
),
and a WC
attribute with original word counts.
See Also
For applying pattern-based dictionaries (to raw text) see lma_patcat()
.
Other Dictionary functions:
dictionary_meta()
,
download.dict()
,
lma_patcat()
,
read.dic()
,
report_term_matches()
,
select.dict()
Examples
dict <- list(category = c("cat", "dog", "pet*"))
lma_termcat(c(
"cat, cat, cat, cat, cat, cat, cat, cat",
"a cat, dog, or anything petlike, really",
"petite petrochemical petitioned petty peter for petrified petunia petals"
), dict, coverage = TRUE)
## Not run:
# Score texts with the NRC Affect Intensity Lexicon
dict <- readLines("https://saifmohammad.com/WebDocs/NRC-AffectIntensity-Lexicon.txt")
dict <- read.table(
text = dict[-seq_len(grep("term\tscore", dict, fixed = TRUE)[[1]])],
col.names = c("term", "weight", "category")
)
text <- c(
angry = paste(
"We are outraged by their hateful brutality,",
"and by the way they terrorize us with their hatred."
),
fearful = paste(
"The horrific torture of that terrorist was tantamount",
"to the terrorism of terrorists."
),
joyous = "I am jubilant to be celebrating the bliss of this happiest happiness.",
sad = paste(
"They are nearly suicidal in their mourning after",
"the tragic and heartbreaking holocaust."
)
)
emotion_scores <- lma_termcat(text, dict)
if (require("splot")) splot(emotion_scores ~ names(text), leg = "out")
## or use the standardized version (which includes more categories)
emotion_scores <- lma_termcat(text, "nrc_eil", dir = "~/Dictionaries")
emotion_scores <- emotion_scores[, c("anger", "fear", "joy", "sadness")]
if (require("splot")) splot(emotion_scores ~ names(text), leg = "out")
## End(Not run)