starspace_embedding {ruimtehol}R Documentation

Get the document or ngram embeddings

Description

Get the document or ngram embeddings

Usage

starspace_embedding(object, x, type = c("document", "ngram"))

Arguments

object

an object of class textspace as returned by starspace or starspace_load_model

x

character vector with text to get the embeddings

  • If type is set to 'document', will assume that a tab or a space is used as separator of each element of x.

  • If type is set to 'ngram', will assume that a space is used as separator of each element of x.

type

the type of embedding requested. Either one of 'document' or 'ngram'. In case of document, the function returns the document embedding, in case of ngram the function returns the embedding of the provided ngram term. See the details section

Details

Value

a matrix of embeddings

Examples

data(dekamer, package = "ruimtehol")
dekamer$text <- strsplit(dekamer$question, "\\W")
dekamer$text <- lapply(dekamer$text, FUN = function(x) x[x != ""])
dekamer$text <- sapply(dekamer$text, 
                       FUN = function(x) paste(x, collapse = " "))

set.seed(123456789)
model <- embed_tagspace(x = tolower(dekamer$text), 
                        y = dekamer$question_theme_main, 
                        similarity = "dot",
                        early_stopping = 0.8, ngram = 1, p = 0.5,
                        dim = 10, minCount = 5)
embedding <- starspace_embedding(model, "federale politie", type = "document")
embedding_dictionary <- as.matrix(model)
embedding
colSums(embedding_dictionary[c("federale", "politie"), ]) / 2^0.5

## Not run: 
set.seed(123456789)
model <- embed_tagspace(x = tolower(dekamer$text), 
                        y = dekamer$question_theme_main, 
                        similarity = "cosine",
                        early_stopping = 0.8, ngram = 1, 
                        dim = 10, minCount = 5)
embedding <- starspace_embedding(model, "federale politie", type = "document")
embedding_dictionary <- as.matrix(model)
euclidean_norm <- function(x) sqrt(sum(x^2))
manual <- colSums(embedding_dictionary[c("federale", "politie"), ])
manual / euclidean_norm(manual)
embedding

set.seed(123456789)
model <- embed_tagspace(x = tolower(dekamer$text), 
                        y = dekamer$question_theme_main, 
                        similarity = "dot",
                        early_stopping = 0.8, ngram = 3, p = 0,
                        dim = 10, minCount = 5, bucket = 1)
starspace_embedding(model, "federale politie", type = "document")
starspace_embedding(model, "federale politie", type = "ngram")

## End(Not run)

[Package ruimtehol version 0.3.2 Index]