keyness {DramaAnalysis} | R Documentation |
Keywords
Description
Given a frequency table (with texts as rows and words as columns),
this function calculates log-likelihood and log ratio of one set of rows against the other rows.
The return value is a list containing scores for each word. If the method
is loglikelihood
, the returned scores are unsigned G2 values. To estimate the
direction of the keyness, the log ratio
is more informative. A nice introduction
into log ratio can be found here.
Usage
keyness(
ft,
categories = c(1, rep(2, nrow(ft) - 1)),
epsilon = 1e-100,
siglevel = 0.05,
method = c("loglikelihood", "logratio"),
minimalFrequency = 10
)
Arguments
ft |
The frequency table |
categories |
A factor or numeric vector that represents an assignment of categories. |
epsilon |
null values are replaced by this value, in order to avoid division by zero |
siglevel |
Return only the keywords above the significance level. Set to 1 to get all words |
method |
Either "logratio" or "loglikelihood" (default) |
minimalFrequency |
Words less frequent than this value are not considered at all |
Value
A list of keywords, sorted by their log-likelihood or log ratio value, calculated according to http://ucrel.lancs.ac.uk/llwizard.html.
Examples
data("rksp.0")
ft <- frequencytable(rksp.0, byCharacter = TRUE, normalize = FALSE)
# Calculate log ratio for all words
genders <- factor(c("m", "m", "m", "m", "f", "m", "m", "m", "f", "m", "m", "f", "m"))
keywords <- keyness(ft, method = "logratio",
categories = genders,
minimalFrequency = 5)
# Remove words that are not significantly different
keywords <- keywords[names(keywords) %in% names(keyness(ft, siglevel = 0.01))]