R: Get the document or word vectors of a paragraph2vec model

as.matrix.paragraph2vec {doc2vec}

R Documentation

Get the document or word vectors of a paragraph2vec model

Description

Get the document or word vectors of a paragraph2vec model as a dense matrix.

Usage

## S3 method for class 'paragraph2vec'
as.matrix(
  x,
  which = c("docs", "words"),
  normalize = TRUE,
  encoding = "UTF-8",
  ...
)

Arguments

`x`	a paragraph2vec model as returned by `paragraph2vec` or `read.paragraph2vec`
`which`	either one of 'docs' or 'words'
`normalize`	logical indicating to normalize the embeddings. Defaults to `TRUE`.
`encoding`	set the encoding of the row names to the specified encoding. Defaults to 'UTF-8'.
`...`	not used

Value

a matrix with the document or word vectors where the rownames are the documents or words upon which the model was trained

Examples


library(tokenizers.bpe)
data(belgium_parliament, package = "tokenizers.bpe")
x <- subset(belgium_parliament, language %in% "french")
x <- subset(x, nchar(text) > 0 & txt_count_words(text) < 1000)

model <- paragraph2vec(x = x, type = "PV-DM",   dim = 15,  iter = 5)

model <- paragraph2vec(x = x, type = "PV-DBOW", dim = 100, iter = 20)


embedding <- as.matrix(model, which = "docs")
embedding <- as.matrix(model, which = "words")
embedding <- as.matrix(model, which = "docs", normalize = FALSE)
embedding <- as.matrix(model, which = "words", normalize = FALSE)