| analogy {LSAfun} | R Documentation |
Analogy
Description
Implements the king - man + woman = queen analogy solving algorithm
Usage
analogy(x1,x2,y1=NA,n,tvectors=tvectors)
Arguments
x1 |
a character vector specifying the first word of the first pair (man in man : king = woman : ?) |
x2 |
a character vector specifying the second word of the first pair (king in man : king = woman : ?) |
y1 |
a character vector specifying the first word of the second pair (woman in man : king = woman : ?) |
n |
the number of neighbors to be computed |
tvectors |
the semantic space in which the computation is to be done (a numeric matrix where every row is a word vector) |
Details
The analogy task is a popular benchmark for vector space models of meaning/word embeddings.
It is based on the rationale that proportinal analogies x1 is to x2 as y1 is to y2, like man : king = woman : ? (correct answer: queen), can be solved via the following operation on the respective word vectors (all normalized to unit norm) king - man + woman = queen (that is, the nearest vector to king - man + woman should be queen) (Mikolov et al., 2013).
The analogy() function comes in two variants, taking as input either three words (x1, x2, and y1) or two words (x1 and x2)
The variant with three input words (
x1,x2, andy1) implements the standard analogy solving algorithm for analogies of the typex1 : x2 = y1 : ?, searching thennearest neighbors forx2 - x1 + y1(all normalized to unit norm) as the best-fitting candidates fory2The variant with two input words (
x1andx2) only computes the difference between the two vectors (both normalized to unit norm) and thennearest neighbors to the resulting difference vector
Value
Returns a list containing a numeric vector and the nearest neighbors to that vector:
In the variant with three input words (
x1,x2, andy1), returns:y2_vecThe result ofx2 - x1 + y1(all normalized to unit norm) as a numeric vectory2_neighborsA named numeric vector of thennearest neighbors toy2_vec. The neighbors are given as names of the vector, and their respective cosines toy2_vecas vector entries.
In the variant with two input words (
x1andx2), returns:x_diff_vecThe result ofx2 - x1(both normalized to unit norm) as a numeric vectorx_diff_neighborsA named numeric vector of thennearest neighbors tox_diff_vec. The neighbors are given as names of the vector, and their respective cosines tox_diff_vecas vector entries.
Author(s)
Fritz Guenther
References
Mikolov, T., Yih, W. T., & Zweig, G. (2013). Linguistic regularities in continuous space word representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT-2013). Association for Computational Linguistics.
See Also
Examples
data(wonderland)
analogy(x1="hatter",x2="mad",y1="cat",n=10,tvectors=wonderland)
analogy(x1="hatter",x2="mad",n=10,tvectors=wonderland)