as.matrix.paragraph2vec {doc2vec}R Documentation

Get the document or word vectors of a paragraph2vec model

Description

Get the document or word vectors of a paragraph2vec model as a dense matrix.

Usage

## S3 method for class 'paragraph2vec'
as.matrix(
  x,
  which = c("docs", "words"),
  normalize = TRUE,
  encoding = "UTF-8",
  ...
)

Arguments

x

a paragraph2vec model as returned by paragraph2vec or read.paragraph2vec

which

either one of 'docs' or 'words'

normalize

logical indicating to normalize the embeddings. Defaults to TRUE.

encoding

set the encoding of the row names to the specified encoding. Defaults to 'UTF-8'.

...

not used

Value

a matrix with the document or word vectors where the rownames are the documents or words upon which the model was trained

See Also

paragraph2vec, read.paragraph2vec

Examples


library(tokenizers.bpe)
data(belgium_parliament, package = "tokenizers.bpe")
x <- subset(belgium_parliament, language %in% "french")
x <- subset(x, nchar(text) > 0 & txt_count_words(text) < 1000)

model <- paragraph2vec(x = x, type = "PV-DM",   dim = 15,  iter = 5)

model <- paragraph2vec(x = x, type = "PV-DBOW", dim = 100, iter = 20)


embedding <- as.matrix(model, which = "docs")
embedding <- as.matrix(model, which = "words")
embedding <- as.matrix(model, which = "docs", normalize = FALSE)
embedding <- as.matrix(model, which = "words", normalize = FALSE)


[Package doc2vec version 0.2.0 Index]