get_document_frequencies {dhlabR} | R Documentation |
Retrieve Token Frequencies in Documents
Description
This function obtains token frequencies within specified documents.
Usage
get_document_frequencies(pids, cutoff = 0, words = NULL)
Arguments
pids |
A vector or data frame containing document IDs. |
cutoff |
A numeric value specifying the frequency cutoff for tokens. |
words |
A vector of words (tokens) to retrieve frequencies for. |
Value
A list containing the following elements for each document:
Document ID
Token
Token frequency in the document
Total tokens in the document
Examples
document_ids <- c("URN:NBN:no-nb_digibok_2008051404065", "URN:NBN:no-nb_digibok_2010092120011")
frequency_cutoff <- 10
tokens <- c(".", ",", "men")
result <- get_document_frequencies(document_ids, frequency_cutoff, tokens)
[Package dhlabR version 1.0.6 Index]