encoding {tok} | R Documentation |
Encoding
Description
Represents the output of a tokenizer.
Value
An encoding object containing encoding information such as attention masks and token ids.
Public fields
.encoding
The underlying implementation pointer.
Active bindings
ids
The IDs are the main input to a Language Model. They are the token indices, the numerical representations that a LM understands.
attention_mask
The attention mask used as input for transformers models.
Methods
Public methods
Method new()
Initializes an encoding object (Not to use directly)
Usage
encoding$new(encoding)
Arguments
encoding
an encoding implementation object
Method clone()
The objects of this class are cloneable with this method.
Usage
encoding$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
Examples
withr::with_envvar(c(HUGGINGFACE_HUB_CACHE = tempdir()), {
try({
tok <- tokenizer$from_pretrained("gpt2")
encoding <- tok$encode("Hello world")
encoding
})
})
[Package tok version 0.1.3 Index]