test_anchors {text2map}R Documentation

Evaluate anchor sets in defining semantic directions

Description

This function evaluates how well an anchor set defines a semantic direction. Anchors must be a two-column data.frame or a list of length == 2. Currently, the function only implements the "PairDir" metric developed by Boutyline and Johnston (2023).

Usage

test_anchors(anchors, wv, method = c("pairdir"), all = FALSE, summarize = TRUE)

Arguments

anchors

A data frame or list of juxtaposed 'anchor' terms

wv

Matrix of word embedding vectors (a.k.a embedding model) with rows as terms.

method

Which metric used to evaluate (currently only pairdir)

all

Logical (default FALSE). Whether to evaluate all possible pairwise combinations of two sets of anchors. If FALSE only the input pairs are used in evaluation and anchor sets must be of equal lengths.

summarize

Logical (default TRUE). Returns a dataframe with AVERAGE scores for input pairs along with each pairs' contribution. If summarize = FALSE, returns a list with each offset matrix, each contribution, and the average score.

Details

According to Boutyline and Johnston (2023):

"We find that PairDir – a measure of parallelism between the offset vectors (and thus of the internal reliability of the estimated relation) – consistently outperforms other reliability metrics in explaining axis accuracy."

Boutyline and Johnston only consider analyst specified pairs. However, if all = TRUE, all pairwise combinations of terms between each set are evaluated. This can allow for unequal sets of anchors, however this increases computational complexity considerably.

Value

dataframe or list

References

Boutyline, Andrei, and Ethan Johnston. 2023. “Forging Better Axes: Evaluating and Improving the Measurement of Semantic Dimensions in Word Embeddings.” doi:10.31235/osf.io/576h3

Examples



# load example word embeddings
data(ft_wv_sample)

df_anchors <- data.frame(
  a = c("rest", "rested", "stay", "stand"),
  z = c("coming", "embarked", "fast", "move")
)

test_anchors(df_anchors, ft_wv_sample)

test_anchors(df_anchors, ft_wv_sample, all = TRUE)


[Package text2map version 0.2.0 Index]