encoding {tok}R Documentation

Encoding

Description

Represents the output of a tokenizer.

Value

An encoding object containing encoding information such as attention masks and token ids.

Public fields

.encoding

The underlying implementation pointer.

Active bindings

ids

The IDs are the main input to a Language Model. They are the token indices, the numerical representations that a LM understands.

attention_mask

The attention mask used as input for transformers models.

Methods

Public methods


Method new()

Initializes an encoding object (Not to use directly)

Usage
encoding$new(encoding)
Arguments
encoding

an encoding implementation object


Method clone()

The objects of this class are cloneable with this method.

Usage
encoding$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

Examples

withr::with_envvar(c(HUGGINGFACE_HUB_CACHE = tempdir()), {
try({
tok <- tokenizer$from_pretrained("gpt2")
encoding <- tok$encode("Hello world")
encoding
})
})

[Package tok version 0.1.3 Index]