get_direction {text2map}R Documentation

Word embedding semantic direction extractor

Description

get_direction() outputs a vector corresponding to one pole of a "semantic direction" built from sets of antonyms or juxtaposed terms. The output can be used as an input to CMDist() and CoCA(). Anchors must be a two-column data.frame or a list of length == 2.

Usage

get_direction(anchors, wv, method = "paired", missing = "stop", n_dirs = 1L)

Arguments

anchors

A data frame or list of juxtaposed 'anchor' terms

wv

Matrix of word embedding vectors (a.k.a embedding model) with rows as terms.

method

Indicates the method used to generate vector offset. Default is 'paired'. See details.

missing

what action to take if terms are not in embeddings. If action = "stop" (default), the function is stopped and an error messages states which terms are missing. If action = "remove", missing terms or rows with missing terms are removed. Missing terms will be printed as a message.

n_dirs

If method = "PCA", an integer indicating how many directions to return. Default = 1L, indicating a single, bipolar, direction.

Details

Semantic directions can be estimated in using a few methods:

Value

returns a one row matrix

Author(s)

Dustin Stoltz

References

Bolukbasi, T., Chang, K. W., Zou, J., Saligrama, V., and Kalai, A. (2016). Quantifying and reducing stereotypes in word embeddings. arXiv preprint https://arxiv.org/abs/1606.06121v1.
Bolukbasi, Tolga, Kai-Wei Chang, James Zou, Venkatesh Saligrama, Adam Kalai (2016). 'Man Is to Computer Programmer as Woman Is to Homemaker? Debiasing Word Embeddings.' Proceedings of the 30th International Conference on Neural Information Processing Systems. 4356-4364. https://dl.acm.org/doi/10.5555/3157382.3157584.
Taylor, Marshall A., and Dustin S. Stoltz. (2020) 'Concept Class Analysis: A Method for Identifying Cultural Schemas in Texts.' Sociological Science 7:544-569. doi:10.15195/v7.a23.
Taylor, Marshall A., and Dustin S. Stoltz. (2020) 'Integrating semantic directions with concept mover's distance to measure binary concept engagement.' Journal of Computational Social Science 1-12. doi:10.1007/s42001-020-00075-8.
Kozlowski, Austin C., Matt Taddy, and James A. Evans. (2019). 'The geometry of culture: Analyzing the meanings of class through word embeddings.' American Sociological Review 84(5):905-949. doi:10.1177/0003122419877135.
Arseniev-Koehler, Alina, and Jacob G. Foster. (2020). 'Machine learning as a model for cultural learning: Teaching an algorithm what it means to be fat.' arXiv preprint https://arxiv.org/abs/2003.12133v2.

Examples


# load example word embeddings
data(ft_wv_sample)

# create anchor list
gen <- data.frame(
  add = c("woman"),
  subtract = c("man")
)

dir <- get_direction(anchors = gen, wv = ft_wv_sample)

dir <- get_direction(
  anchors = gen, wv = ft_wv_sample,
  method = "PCA", n = 1L
)

[Package text2map version 0.2.0 Index]