R: Evaluate anchor sets in defining semantic directions

test_anchors {text2map}

R Documentation

Evaluate anchor sets in defining semantic directions

Description

This function evaluates how well an anchor set defines a semantic direction. Anchors must be a two-column data.frame or a list of length == 2. Currently, the function only implements the "PairDir" metric developed by Boutyline and Johnston (2023).

Usage

test_anchors(anchors, wv, method = c("pairdir"), all = FALSE, summarize = TRUE)

Arguments

`anchors`	A data frame or list of juxtaposed 'anchor' terms
`wv`	Matrix of word embedding vectors (a.k.a embedding model) with rows as terms.
`method`	Which metric used to evaluate (currently only pairdir)
`all`	Logical (default `FALSE`). Whether to evaluate all possible pairwise combinations of two sets of anchors. If `FALSE` only the input pairs are used in evaluation and anchor sets must be of equal lengths.
`summarize`	Logical (default `TRUE`). Returns a dataframe with AVERAGE scores for input pairs along with each pairs' contribution. If `summarize = FALSE`, returns a list with each offset matrix, each contribution, and the average score.

Details

According to Boutyline and Johnston (2023):

"We find that PairDir – a measure of parallelism between the offset vectors (and thus of the internal reliability of the estimated relation) – consistently outperforms other reliability metrics in explaining axis accuracy."

Boutyline and Johnston only consider analyst specified pairs. However, if all = TRUE, all pairwise combinations of terms between each set are evaluated. This can allow for unequal sets of anchors, however this increases computational complexity considerably.

Value

dataframe or list

References

Boutyline, Andrei, and Ethan Johnston. 2023. “Forging Better Axes: Evaluating and Improving the Measurement of Semantic Dimensions in Word Embeddings.” doi:10.31235/osf.io/576h3

Examples



# load example word embeddings
data(ft_wv_sample)

df_anchors <- data.frame(
  a = c("rest", "rested", "stay", "stand"),
  z = c("coming", "embarked", "fast", "move")
)

test_anchors(df_anchors, ft_wv_sample)

test_anchors(df_anchors, ft_wv_sample, all = TRUE)

[Package text2map version 0.2.0 Index]