EmbeddedText {aifeducation} | R Documentation |
Embedded text
Description
Object of class R6 which stores the text embeddings
generated by an object of class TextEmbeddingModel via the method
embed()
.
Value
Returns an object of class EmbeddedText
. These objects are used
for storing and managing the text embeddings created with objects of class TextEmbeddingModel.
Objects of class EmbeddedText
serve as input for classifiers of class
TextEmbeddingClassifierNeuralNet. The main aim of this class is to provide a structured link between
embedding models and classifiers. Since objects of this class save information on
the text embedding model that created the text embedding it ensures that only
embedding generated with same embedding model are combined. Furthermore, the stored information allows
classifiers to check if embeddings of the correct text embedding model are used for
training and predicting.
Public fields
embeddings
('data.frame()')
data.frame containing the text embeddings for all chunks. Documents are in the rows. Embedding dimensions are in the columns.
Methods
Public methods
Method new()
Creates a new object representing text embeddings.
Usage
EmbeddedText$new( model_name = NA, model_label = NA, model_date = NA, model_method = NA, model_version = NA, model_language = NA, param_seq_length = NA, param_chunks = NULL, param_overlap = NULL, param_emb_layer_min = NULL, param_emb_layer_max = NULL, param_emb_pool_type = NULL, param_aggregation = NULL, embeddings )
Arguments
model_name
string
Name of the model that generates this embedding.model_label
string
Label of the model that generates this embedding.model_date
string
Date when the embedding generating model was created.model_method
string
Method of the underlying embedding model.model_version
string
Version of the model that generated this embedding.model_language
string
Language of the model that generated this embedding.param_seq_length
int
Maximum number of tokens that processes the generating model for a chunk.param_chunks
int
Maximum number of chunks which are supported by the generating model.param_overlap
int
Number of tokens that were added at the beginning of the sequence for the next chunk by this model.param_emb_layer_min
int
orstring
determining the first layer to be included in the creation of embeddings.param_emb_layer_max
int
orstring
determining the last layer to be included in the creation of embeddings.param_emb_pool_type
string
determining the method for pooling the token embeddings within each layer.param_aggregation
string
Aggregation method of the hidden states. Deprecated. Only included for backward compatibility.embeddings
data.frame
containing the text embeddings.
Returns
Returns an object of class EmbeddedText which stores the text embeddings produced by an objects of class TextEmbeddingModel. The object serves as input for objects of class TextEmbeddingClassifierNeuralNet.
Method get_model_info()
Method for retrieving information about the model that generated this embedding.
Usage
EmbeddedText$get_model_info()
Returns
list
contains all saved information about the underlying
text embedding model.
Method get_model_label()
Method for retrieving the label of the model that generated this embedding.
Usage
EmbeddedText$get_model_label()
Returns
string
Label of the corresponding text embedding model
Method clone()
The objects of this class are cloneable with this method.
Usage
EmbeddedText$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
See Also
Other Text Embedding:
TextEmbeddingModel
,
combine_embeddings()