predict.word2vec {word2vec} | R Documentation |
Predict functionalities for a word2vec model
Description
Get either
the embedding of words
the nearest words which are similar to either a word or a word vector
Usage
## S3 method for class 'word2vec'
predict(
object,
newdata,
type = c("nearest", "embedding"),
top_n = 10L,
encoding = "UTF-8",
...
)
Arguments
object |
a word2vec model as returned by |
newdata |
for type 'embedding', |
type |
either 'embedding' or 'nearest'. Defaults to 'nearest'. |
top_n |
show only the top n nearest neighbours. Defaults to 10. |
encoding |
set the encoding of the text elements to the specified encoding. Defaults to 'UTF-8'. |
... |
not used |
Value
depending on the type, you get a different result back:
for type nearest: a list of data.frames with columns term, similarity and rank indicating with words which are closest to the provided
newdata
words or word vectors. Ifnewdata
is just one vector instead of a matrix, it returns a data.framefor type embedding: a matrix of word vectors of the words provided in
newdata
See Also
Examples
path <- system.file(package = "word2vec", "models", "example.bin")
model <- read.word2vec(path)
emb <- predict(model, c("bus", "toilet", "unknownword"), type = "embedding")
emb
nn <- predict(model, c("bus", "toilet"), type = "nearest", top_n = 5)
nn
# Do some calculations with the vectors and find similar terms to these
emb <- as.matrix(model)
vector <- emb["buurt", ] - emb["rustige", ] + emb["restaurants", ]
predict(model, vector, type = "nearest", top_n = 10)
vector <- emb["gastvrouw", ] - emb["gastvrij", ]
predict(model, vector, type = "nearest", top_n = 5)
vectors <- emb[c("gastheer", "gastvrouw"), ]
vectors <- rbind(vectors, avg = colMeans(vectors))
predict(model, vectors, type = "nearest", top_n = 10)