num_tokens_file {TheOpenAIR}R Documentation

Compute total number of tokens in a text file

Description

The function batch-wise computes the total number of tokens in a text file. The function returns a numeric value indicating the total number of tokens in the file. The function can be used on very large text files.

Usage

num_tokens_file(filename, batch_size = 1000, encoding = "cl100k_base")

Arguments

filename

character string indicating the name of the text file to read in

batch_size

integer indicating the number of lines to read in per batch (default is 1000)

encoding

character string indicating the encoding to use (default is "cl100k_base")

Value

a numeric value indicating the total number of tokens in the text file


[Package TheOpenAIR version 0.1.0 Index]