label_topics {topiclabels} | R Documentation |
Automatically label topics using language models based on top terms
Description
Performs an automated labeling process of topics from topic models using language models. For this, the top terms and (optionally) a short context description are used.
Usage
label_topics(
terms,
model = "mistralai/Mixtral-8x7B-Instruct-v0.1",
params = list(),
token = NA_character_,
context = "",
sep_terms = "; ",
max_length_label = 5L,
prompt_type = c("json", "plain", "json-roles"),
max_wait = 0L,
progress = TRUE
)
Arguments
terms |
[ |
model |
[ |
params |
[ |
token |
[ |
context |
[ |
sep_terms |
[ |
max_length_label |
[ |
prompt_type |
[ |
max_wait |
[ |
progress |
[ |
Details
The function builds helpful prompts based on the top terms and sends these
prompts to language models on Huggingface. The output is in turn
post-processed so that the labels for each topic are extracted automatically.
If the automatically extracted labels show any errors, they can alternatively
be extracted using custom functions or manually from the original output of
the model using the model_output
entry of the lm_topic_labels object.
Implemented default parameters for the models HuggingFaceH4/zephyr-7b-beta
,
tiiuae/falcon-7b-instruct
, and mistralai/Mixtral-8x7B-Instruct-v0.1
are:
max_new_tokens
300
return_full_text
FALSE
Implemented prompt types are:
json
the language model is asked to respond in JSON format with a single field called 'label', specifying the best label for the topic
plain
the language model is asked to return an answer that should only consist of the best label for the topic
json-roles
the language model is asked to respond in JSON format with a single field called 'label', specifying the best label for the topic; in addition, the model is queried using identifiers for <|user|> input and the beginning of the <|assistant|> output
Value
[named list
] lm_topic_labels
object.
Examples
## Not run:
token = "" # please insert your hf token here
topwords_matrix = matrix(c("zidane", "figo", "kroos",
"gas", "power", "wind"), ncol = 2)
label_topics(topwords_matrix, token = token)
label_topics(list(c("zidane", "figo", "kroos"),
c("gas", "power", "wind")),
token = token)
label_topics(list(c("zidane", "figo", "ronaldo"),
c("gas", "power", "wind")),
token = token)
label_topics(list("wind", "greta", "hambach"),
token = token)
label_topics(list("wind", "fire", "air"),
token = token)
label_topics(list("wind", "feuer", "luft"),
token = token)
label_topics(list("wind", "feuer", "luft"),
context = "Elements of the Earth",
token = token)
## End(Not run)