num_tokens_file {TheOpenAIR} | R Documentation |
Compute total number of tokens in a text file
Description
The function batch-wise computes the total number of tokens in a text file. The function returns a numeric value indicating the total number of tokens in the file. The function can be used on very large text files.
Usage
num_tokens_file(filename, batch_size = 1000, encoding = "cl100k_base")
Arguments
filename |
character string indicating the name of the text file to read in |
batch_size |
integer indicating the number of lines to read in per batch (default is 1000) |
encoding |
character string indicating the encoding to use (default is "cl100k_base") |
Value
a numeric value indicating the total number of tokens in the text file
[Package TheOpenAIR version 0.1.0 Index]