dictionaryStatistics {DramaAnalysis} | R Documentation |
Dictionary Use
Description
These methods retrieve
count the number of occurrences of the words in the dictionaries,
across different speakers and/or segments.
The function dictionaryStatistics()
calculates statistics for
dictionaries with multiple entries, dictionaryStatisticsSingle()
only
for a single word list.
Extract the number part from a
QDDictionaryStatistics
table as a matrix
Usage
dictionaryStatistics(
drama,
fields = DramaAnalysis::base_dictionary[fieldnames],
fieldnames = c("Liebe"),
segment = c("Drama", "Act", "Scene"),
normalizeByCharacter = FALSE,
normalizeByField = FALSE,
byCharacter = TRUE,
column = "Token.lemma",
ci = TRUE
)
dictionaryStatisticsSingle(
drama,
wordfield = c(),
segment = c("Drama", "Act", "Scene"),
normalizeByCharacter = FALSE,
normalizeByField = FALSE,
byCharacter = TRUE,
fieldNormalizer = length(wordfield),
column = "Token.lemma",
ci = TRUE,
colnames = NULL
)
## S3 method for class 'QDDictionaryStatistics'
as.matrix(x, ...)
Arguments
drama |
A QDDrama object. |
fields |
A list of lists that contains the actual field names.
By default, we load the |
fieldnames |
A list of names for the dictionaries. |
segment |
The segment level that should be used. By default, the entire play will be used. Possible values are "Drama" (default), "Act" or "Scene". |
normalizeByCharacter |
Logical. Whether to normalize by character speech length. |
normalizeByField |
Logical. Whether to normalize by dictionary size. You usually want this. |
byCharacter |
Logical, defaults to TRUE. If false, values will be calculated for the entire segment (play, act, or scene), and not for individual characters. |
column |
The table column we apply the dictionary on. Should be either "Token.surface" or "Token.lemma", the latter is the default. |
ci |
Whether to ignore case. Defaults to TRUE, i.e., case is ignored. |
wordfield |
A character vector containing the words or lemmas
to be counted (only for |
fieldNormalizer |
Defaults to the length of the wordfield. If normalizeByField is given, the absolute numbers are divided by this number. |
colnames |
The column names to be used in the output table. |
x |
An object of the type |
... |
All other parameters are passed to |
Value
A numeric matrix that contains the frequency with which a dictionary is present in a subset of tokens
See Also
Examples
# Check multiple dictionary entries
data(rksp.0)
dstat <- dictionaryStatistics(rksp.0, fieldnames=c("Krieg","Familie"))
# Check a single dictionary entries
data(rksp.0)
fstat <- dictionaryStatisticsSingle(rksp.0, wordfield=c("der"))
mat <- as.matrix(dictionaryStatistics(rksp.0, fieldnames=c("Krieg","Familie")))