test_anchors {text2map} | R Documentation |
Evaluate anchor sets in defining semantic directions
Description
This function evaluates how well an anchor set defines a semantic direction. Anchors must be a two-column data.frame or a list of length == 2. Currently, the function only implements the "PairDir" metric developed by Boutyline and Johnston (2023).
Usage
test_anchors(anchors, wv, method = c("pairdir"), all = FALSE, summarize = TRUE)
Arguments
anchors |
A data frame or list of juxtaposed 'anchor' terms |
wv |
Matrix of word embedding vectors (a.k.a embedding model) with rows as terms. |
method |
Which metric used to evaluate (currently only pairdir) |
all |
Logical (default |
summarize |
Logical (default |
Details
According to Boutyline and Johnston (2023):
"We find that PairDir – a measure of parallelism between the offset vectors (and thus of the internal reliability of the estimated relation) – consistently outperforms other reliability metrics in explaining axis accuracy."
Boutyline and Johnston only consider analyst specified pairs. However,
if all = TRUE
, all pairwise combinations of terms between each set
are evaluated. This can allow for unequal sets of anchors, however this
increases computational complexity considerably.
Value
dataframe or list
References
Boutyline, Andrei, and Ethan Johnston. 2023. “Forging Better Axes: Evaluating and Improving the Measurement of Semantic Dimensions in Word Embeddings.” doi:10.31235/osf.io/576h3
Examples
# load example word embeddings
data(ft_wv_sample)
df_anchors <- data.frame(
a = c("rest", "rested", "stay", "stand"),
z = c("coming", "embarked", "fast", "move")
)
test_anchors(df_anchors, ft_wv_sample)
test_anchors(df_anchors, ft_wv_sample, all = TRUE)