get_dispersion {dhlabR}R Documentation

Dispersion of tokens in a text

Description

This function wraps a call to the dispersion service, which calculates the dispersion of a list of tokens throughout a text in the National Library of Norway's collection, given by the URN. The text is divided into chunks, and the count of tokens in each chunk is returned.

Usage

get_dispersion(urn = NULL, words = list(".", ","), window = 500, pr = 100)

Arguments

urn

A National Library of Norway URN to a text object.

words

A list or vector of words (tokens) to analyze for dispersion.

window

The size of the text chunk to count the tokens within.

pr

(Per) Determines the step size for moving forward to the next chunk. If 'pr' is equal to 'window', the text is divided into non-overlapping chunks of size 'window'. If 'pr' is smaller than 'window', the chunks will overlap, creating a smoother curve.

Value

A data frame with the count of tokens in each chunk.

Examples

urn <- "URN:NBN:no-nb_digibok_2013060406055"
words <- c("Dracula", "Mina", "Helsing")
window <- 1000
pr <- 1000
dispersion_result <- get_dispersion(urn, words, window, pr)

[Package dhlabR version 1.0.2 Index]