most_similar {PsychWordVec} | R Documentation |
Find the Top-N most similar words.
Description
Find the Top-N most similar words, which replicates the results produced
by the Python gensim
module most_similar()
function.
(Exact replication of gensim
requires the same word vectors data,
not the demodata
used here in examples.)
Usage
most_similar(
data,
x = NULL,
topn = 10,
above = NULL,
keep = FALSE,
row.id = TRUE,
verbose = TRUE
)
Arguments
data |
A |
x |
Can be:
|
topn |
Top-N most similar words. Defaults to |
above |
Defaults to
If both |
keep |
Keep words specified in |
row.id |
Return the row number of each word? Defaults to |
verbose |
Print information to the console? Defaults to |
Value
A data.table
with the most similar words and their cosine similarities.
Download
Download pre-trained word vectors data (.RData
):
https://psychbruce.github.io/WordVector_RData.pdf
See Also
Examples
d = as_embed(demodata, normalize=TRUE)
most_similar(d)
most_similar(d, "China")
most_similar(d, c("king", "queen"))
most_similar(d, cc(" king , queen ; man | woman "))
# the same as above:
most_similar(d, ~ China)
most_similar(d, ~ king + queen)
most_similar(d, ~ king + queen + man + woman)
most_similar(d, ~ boy - he + she)
most_similar(d, ~ Jack - he + she)
most_similar(d, ~ Rose - she + he)
most_similar(d, ~ king - man + woman)
most_similar(d, ~ Tokyo - Japan + China)
most_similar(d, ~ Beijing - China + Japan)
most_similar(d, "China", above=0.7)
most_similar(d, "China", above="Shanghai")
# automatically normalized for more accurate results
ms = most_similar(demodata, ~ king - man + woman)
ms
str(ms)