BERT_vocab {FMAT} | R Documentation |
Check if mask words are in the model vocabulary.
Description
Check if mask words are in the model vocabulary.
Usage
BERT_vocab(
models,
mask.words,
add.tokens = FALSE,
add.method = c("sum", "mean")
)
Arguments
models |
Model names at HuggingFace. |
mask.words |
Option words filling in the mask. |
add.tokens |
Add new tokens (for out-of-vocabulary words or even phrases) to model vocabulary?
Defaults to |
add.method |
Method used to produce the token embeddings of new added tokens.
Can be |
Value
A data.table of model name, mask word, real token (replaced if out of vocabulary), and token id (0~N).
See Also
Examples
## Not run:
models = c("bert-base-uncased", "bert-base-cased")
BERT_info(models)
BERT_vocab(models, c("bruce", "Bruce"))
BERT_vocab(models, 2020:2025) # some are out-of-vocabulary
BERT_vocab(models, 2020:2025, add.tokens=TRUE) # add vocab
BERT_vocab(models,
c("individualism", "artificial intelligence"),
add.tokens=TRUE)
## End(Not run)
[Package FMAT version 2024.7 Index]