get_dispersion {dhlabR} | R Documentation |
Dispersion of tokens in a text
Description
This function wraps a call to the dispersion service, which calculates the dispersion of a list of tokens throughout a text in the National Library of Norway's collection, given by the URN. The text is divided into chunks, and the count of tokens in each chunk is returned.
Usage
get_dispersion(urn = NULL, words = list(".", ","), window = 500, pr = 100)
Arguments
urn |
A National Library of Norway URN to a text object. |
words |
A list or vector of words (tokens) to analyze for dispersion. |
window |
The size of the text chunk to count the tokens within. |
pr |
(Per) Determines the step size for moving forward to the next chunk. If 'pr' is equal to 'window', the text is divided into non-overlapping chunks of size 'window'. If 'pr' is smaller than 'window', the chunks will overlap, creating a smoother curve. |
Value
A data frame with the count of tokens in each chunk.
Examples
urn <- "URN:NBN:no-nb_digibok_2013060406055"
words <- c("Dracula", "Mina", "Helsing")
window <- 1000
pr <- 1000
dispersion_result <- get_dispersion(urn, words, window, pr)
[Package dhlabR version 1.0.6 Index]