tab_similarity {PsychWordVec}R Documentation

Tabulate cosine similarity/distance of word pairs.

Description

Tabulate cosine similarity/distance of word pairs.

Usage

tab_similarity(
  data,
  words = NULL,
  pattern = NULL,
  words1 = NULL,
  words2 = NULL,
  unique = FALSE,
  distance = FALSE
)

Arguments

data

A wordvec (data.table) or embed (matrix), see data_wordvec_load.

words

[Option 1] Character string(s).

pattern

[Option 2] Regular expression (see str_subset). If neither words nor pattern are specified (i.e., both are NULL), then all words in the data will be extracted.

words1, words2

[Option 3] Two sets of words for only n1 * n2 word pairs. See examples.

unique

Return unique word pairs (TRUE) or all pairs with duplicates (FALSE; default).

distance

Compute cosine distance instead? Defaults to FALSE (cosine similarity).

Value

A data.table of words, word pairs, and their cosine similarity (cos_sim) or cosine distance (cos_dist).

Download

Download pre-trained word vectors data (.RData): https://psychbruce.github.io/WordVector_RData.pdf

See Also

cosine_similarity

pair_similarity

plot_similarity

most_similar

test_WEAT

test_RND

Examples

tab_similarity(demodata, cc("king, queen, man, woman"))
tab_similarity(demodata, cc("king, queen, man, woman"),
               unique=TRUE)

tab_similarity(demodata, cc("Beijing, China, Tokyo, Japan"))
tab_similarity(demodata, cc("Beijing, China, Tokyo, Japan"),
               unique=TRUE)

## only n1 * n2 word pairs across two sets of words
tab_similarity(demodata,
               words1=cc("king, queen, King, Queen"),
               words2=cc("man, woman"))


[Package PsychWordVec version 2023.9 Index]