predict.word2vec {word2vec}R Documentation

Predict functionalities for a word2vec model

Description

Get either

Usage

## S3 method for class 'word2vec'
predict(
  object,
  newdata,
  type = c("nearest", "embedding"),
  top_n = 10L,
  encoding = "UTF-8",
  ...
)

Arguments

object

a word2vec model as returned by word2vec or read.word2vec

newdata

for type 'embedding', newdata should be a character vector of words
for type 'nearest', newdata should be a character vector of words or a matrix in the embedding space

type

either 'embedding' or 'nearest'. Defaults to 'nearest'.

top_n

show only the top n nearest neighbours. Defaults to 10.

encoding

set the encoding of the text elements to the specified encoding. Defaults to 'UTF-8'.

...

not used

Value

depending on the type, you get a different result back:

See Also

word2vec, read.word2vec

Examples

path  <- system.file(package = "word2vec", "models", "example.bin")
model <- read.word2vec(path)
emb <- predict(model, c("bus", "toilet", "unknownword"), type = "embedding")
emb
nn  <- predict(model, c("bus", "toilet"), type = "nearest", top_n = 5)
nn

# Do some calculations with the vectors and find similar terms to these
emb <- as.matrix(model)
vector <- emb["buurt", ] - emb["rustige", ] + emb["restaurants", ]
predict(model, vector, type = "nearest", top_n = 10)

vector <- emb["gastvrouw", ] - emb["gastvrij", ]
predict(model, vector, type = "nearest", top_n = 5)

vectors <- emb[c("gastheer", "gastvrouw"), ]
vectors <- rbind(vectors, avg = colMeans(vectors))
predict(model, vectors, type = "nearest", top_n = 10)

[Package word2vec version 0.4.0 Index]