| keyness {DramaAnalysis} | R Documentation | 
Keywords
Description
Given a frequency table (with texts as rows and words as columns),
this function calculates log-likelihood and log ratio of one set of rows against the other rows. 
The return value is a list containing scores for each word. If the method 
is loglikelihood, the returned scores are unsigned G2 values. To estimate the 
direction of the keyness, the log ratio is more informative. A nice introduction 
into log ratio can be found here.
Usage
keyness(
  ft,
  categories = c(1, rep(2, nrow(ft) - 1)),
  epsilon = 1e-100,
  siglevel = 0.05,
  method = c("loglikelihood", "logratio"),
  minimalFrequency = 10
)
Arguments
| ft | The frequency table | 
| categories | A factor or numeric vector that represents an assignment of categories. | 
| epsilon | null values are replaced by this value, in order to avoid division by zero | 
| siglevel | Return only the keywords above the significance level. Set to 1 to get all words | 
| method | Either "logratio" or "loglikelihood" (default) | 
| minimalFrequency | Words less frequent than this value are not considered at all | 
Value
A list of keywords, sorted by their log-likelihood or log ratio value, calculated according to http://ucrel.lancs.ac.uk/llwizard.html.
Examples
data("rksp.0")
ft <- frequencytable(rksp.0, byCharacter = TRUE, normalize = FALSE)
# Calculate log ratio for all words
genders <- factor(c("m", "m", "m", "m", "f", "m", "m", "m", "f", "m", "m", "f", "m"))
keywords <- keyness(ft, method = "logratio", 
                    categories = genders, 
                    minimalFrequency = 5)
# Remove words that are not significantly different
keywords <- keywords[names(keywords) %in% names(keyness(ft, siglevel = 0.01))]