ect {sweater} | R Documentation |
Embedding Coherence Test
Description
This function estimate the Embedding Coherence Test (ECT) of word embeddings (Dev & Philips, 2019). If possible, please use query()
instead.
Usage
ect(w, S_words, A_words, B_words, verbose = FALSE)
Arguments
w |
a numeric matrix of word embeddings, e.g. from |
S_words |
a character vector of the first set of target words. In an example of studying gender stereotype, it can include occupations such as programmer, engineer, scientists... |
A_words |
a character vector of the first set of attribute words. In an example of studying gender stereotype, it can include words such as man, male, he, his. |
B_words |
a character vector of the second set of attribute words. In an example of studying gender stereotype, it can include words such as woman, female, she, her. |
verbose |
logical, whether to display information |
Value
A list with class "ect"
containing the following components:
-
$A_words
the input A_words -
$B_words
the input B_words -
$S_words
the input S_words -
$u_a
Cosine similarity between each word vector of S_words and average vector of A_words -
$u_b
Cosine similarity between each word vector of S_words and average vector of B_words
References
Dev, S., & Phillips, J. (2019, April). Attenuating bias in word vectors. In The 22nd International Conference on Artificial Intelligence and Statistics (pp. 879-887). PMLR.
See Also
ect_es()
can be used to obtain the effect size of the test.
plot_ect()
can be used to visualize the result.
Examples
data(googlenews)
S1 <- c("janitor", "statistician", "midwife", "bailiff", "auctioneer",
"photographer", "geologist", "shoemaker", "athlete", "cashier", "dancer",
"housekeeper", "accountant", "physicist", "gardener", "dentist", "weaver",
"blacksmith", "psychologist", "supervisor", "mathematician", "surveyor",
"tailor", "designer", "economist", "mechanic", "laborer", "postmaster",
"broker", "chemist", "librarian", "attendant", "clerical", "musician",
"porter", "scientist", "carpenter", "sailor", "instructor", "sheriff",
"pilot", "inspector", "mason", "baker", "administrator", "architect",
"collector", "operator", "surgeon", "driver", "painter", "conductor",
"nurse", "cook", "engineer", "retired", "sales", "lawyer", "clergy",
"physician", "farmer", "clerk", "manager", "guard", "artist", "smith",
"official", "police", "doctor", "professor", "student", "judge",
"teacher", "author", "secretary", "soldier")
A1 <- c("he", "son", "his", "him", "father", "man", "boy", "himself",
"male", "brother", "sons", "fathers", "men", "boys", "males", "brothers",
"uncle", "uncles", "nephew", "nephews")
B1 <- c("she", "daughter", "hers", "her", "mother", "woman", "girl",
"herself", "female", "sister", "daughters", "mothers", "women", "girls",
"females", "sisters", "aunt", "aunts", "niece", "nieces")
garg_f1 <- ect(googlenews, S1, A1, B1)
plot_ect(garg_f1)