test_WEAT {PsychWordVec} | R Documentation |
Word Embedding Association Test (WEAT) and Single-Category WEAT.
Description
Tabulate data (cosine similarity and standardized effect size) and conduct the permutation test of significance for the Word Embedding Association Test (WEAT) and Single-Category Word Embedding Association Test (SC-WEAT).
For WEAT, two-samples permutation test is conducted (i.e., rearrangements of data).
For SC-WEAT, one-sample permutation test is conducted (i.e., rearrangements of +/- signs to data).
Usage
test_WEAT(
data,
T1,
T2,
A1,
A2,
use.pattern = FALSE,
labels = list(),
p.perm = TRUE,
p.nsim = 10000,
p.side = 2,
seed = NULL,
pooled.sd = "Caliskan"
)
Arguments
data |
A |
T1 , T2 |
Target words (a vector of words or a pattern of regular expression).
If only |
A1 , A2 |
Attribute words (a vector of words or a pattern of regular expression). Both must be specified. |
use.pattern |
Defaults to |
labels |
Labels for target and attribute concepts (a named |
p.perm |
Permutation test to get exact or approximate p value of the overall effect.
Defaults to |
p.nsim |
Number of samples for resampling in permutation test. Defaults to If |
p.side |
One-sided ( In Caliskan et al.'s (2017) article, they reported one-sided p value for WEAT. Here, I suggest reporting two-sided p value as a more conservative estimate. The users take the full responsibility for the choice.
|
seed |
Random seed for reproducible results of permutation test. Defaults to |
pooled.sd |
Method used to calculate the pooled SD for effect size estimate in WEAT.
|
Value
A list
object of new class weat
:
words.valid
-
Valid (actually matched) words
words.not.found
-
Words not found
data.raw
-
A
data.table
of cosine similarities between all word pairs data.mean
-
A
data.table
of mean cosine similarities across all attribute words data.diff
-
A
data.table
of differential mean cosine similarities between the two attribute concepts eff.label
-
Description for the difference between the two attribute concepts
eff.type
-
Effect type: WEAT or SC-WEAT
eff
-
Raw effect, standardized effect size, and p value (if
p.perm=TRUE
)
Download
Download pre-trained word vectors data (.RData
):
https://psychbruce.github.io/WordVector_RData.pdf
References
Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183–186.
See Also
Examples
## cc() is more convenient than c()!
weat = test_WEAT(
demodata,
labels=list(T1="King", T2="Queen", A1="Male", A2="Female"),
T1=cc("king, King"),
T2=cc("queen, Queen"),
A1=cc("male, man, boy, brother, he, him, his, son"),
A2=cc("female, woman, girl, sister, she, her, hers, daughter"),
seed=1)
weat
sc_weat = test_WEAT(
demodata,
labels=list(T1="Occupation", A1="Male", A2="Female"),
T1=cc("
architect, boss, leader, engineer, CEO, officer, manager,
lawyer, scientist, doctor, psychologist, investigator,
consultant, programmer, teacher, clerk, counselor,
salesperson, therapist, psychotherapist, nurse"),
A1=cc("male, man, boy, brother, he, him, his, son"),
A2=cc("female, woman, girl, sister, she, her, hers, daughter"),
seed=1)
sc_weat
## Not run:
## the same as the first example, but using regular expression
weat = test_WEAT(
demodata,
labels=list(T1="King", T2="Queen", A1="Male", A2="Female"),
use.pattern=TRUE, # use regular expression below
T1="^[kK]ing$",
T2="^[qQ]ueen$",
A1="^male$|^man$|^boy$|^brother$|^he$|^him$|^his$|^son$",
A2="^female$|^woman$|^girl$|^sister$|^she$|^her$|^hers$|^daughter$",
seed=1)
weat
## replicating Caliskan et al.'s (2017) results
## WEAT7 (Table 1): d = 1.06, p = .018
## (requiring installation of the `sweater` package)
Caliskan.WEAT7 = test_WEAT(
as_wordvec(sweater::glove_math),
labels=list(T1="Math", T2="Arts", A1="Male", A2="Female"),
T1=cc("math, algebra, geometry, calculus, equations, computation, numbers, addition"),
T2=cc("poetry, art, dance, literature, novel, symphony, drama, sculpture"),
A1=cc("male, man, boy, brother, he, him, his, son"),
A2=cc("female, woman, girl, sister, she, her, hers, daughter"),
p.side=1, seed=1234)
Caliskan.WEAT7
# d = 1.055, p = .0173 (= 173 counts / 10000 permutation samples)
## replicating Caliskan et al.'s (2017) supplemental results
## WEAT7 (Table S1): d = 0.97, p = .027
Caliskan.WEAT7.supp = test_WEAT(
demodata,
labels=list(T1="Math", T2="Arts", A1="Male", A2="Female"),
T1=cc("math, algebra, geometry, calculus, equations, computation, numbers, addition"),
T2=cc("poetry, art, dance, literature, novel, symphony, drama, sculpture"),
A1=cc("male, man, boy, brother, he, him, his, son"),
A2=cc("female, woman, girl, sister, she, her, hers, daughter"),
p.side=1, seed=1234)
Caliskan.WEAT7.supp
# d = 0.966, p = .0221 (= 221 counts / 10000 permutation samples)
## End(Not run)