analogy {LSAfun} | R Documentation |
Analogy
Description
Implements the king - man + woman = queen analogy solving algorithm
Usage
analogy(x1,x2,y1=NA,n,tvectors=tvectors)
Arguments
x1 |
a character vector specifying the first word of the first pair (man in man : king = woman : ?) |
x2 |
a character vector specifying the second word of the first pair (king in man : king = woman : ?) |
y1 |
a character vector specifying the first word of the second pair (woman in man : king = woman : ?) |
n |
the number of neighbors to be computed |
tvectors |
the semantic space in which the computation is to be done (a numeric matrix where every row is a word vector) |
Details
The analogy task is a popular benchmark for vector space models of meaning/word embeddings.
It is based on the rationale that proportinal analogies x1 is to x2 as y1 is to y2, like man : king = woman : ? (correct answer: queen), can be solved via the following operation on the respective word vectors (all normalized to unit norm) king - man + woman = queen
(that is, the nearest vector to king - man + woman
should be queen
) (Mikolov et al., 2013).
The analogy()
function comes in two variants, taking as input either three words (x1
, x2
, and y1
) or two words (x1
and x2
)
The variant with three input words (
x1
,x2
, andy1
) implements the standard analogy solving algorithm for analogies of the typex1 : x2 = y1 : ?
, searching then
nearest neighbors forx2 - x1 + y1
(all normalized to unit norm) as the best-fitting candidates fory2
The variant with two input words (
x1
andx2
) only computes the difference between the two vectors (both normalized to unit norm) and then
nearest neighbors to the resulting difference vector
Value
Returns a list containing a numeric vector and the nearest neighbors to that vector:
In the variant with three input words (
x1
,x2
, andy1
), returns:y2_vec
The result ofx2 - x1 + y1
(all normalized to unit norm) as a numeric vectory2_neighbors
A named numeric vector of then
nearest neighbors toy2_vec
. The neighbors are given as names of the vector, and their respective cosines toy2_vec
as vector entries.
In the variant with two input words (
x1
andx2
), returns:x_diff_vec
The result ofx2 - x1
(both normalized to unit norm) as a numeric vectorx_diff_neighbors
A named numeric vector of then
nearest neighbors tox_diff_vec
. The neighbors are given as names of the vector, and their respective cosines tox_diff_vec
as vector entries.
Author(s)
Fritz Guenther
References
Mikolov, T., Yih, W. T., & Zweig, G. (2013). Linguistic regularities in continuous space word representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT-2013). Association for Computational Linguistics.
See Also
Examples
data(wonderland)
analogy(x1="hatter",x2="mad",y1="cat",n=10,tvectors=wonderland)
analogy(x1="hatter",x2="mad",n=10,tvectors=wonderland)