R: Tabulate cosine similarity/distance of word pairs.

tab_similarity {PsychWordVec}

R Documentation

Tabulate cosine similarity/distance of word pairs.

Description

Tabulate cosine similarity/distance of word pairs.

Usage

tab_similarity(
  data,
  words = NULL,
  pattern = NULL,
  words1 = NULL,
  words2 = NULL,
  unique = FALSE,
  distance = FALSE
)

Arguments

`data`	A `wordvec` (data.table) or `embed` (matrix), see `data_wordvec_load`.
`words`	[Option 1] Character string(s).
`pattern`	[Option 2] Regular expression (see `str_subset`). If neither `words` nor `pattern` are specified (i.e., both are `NULL`), then all words in the data will be extracted.
`words1`, `words2`	[Option 3] Two sets of words for only n1 * n2 word pairs. See examples.
`unique`	Return unique word pairs (`TRUE`) or all pairs with duplicates (`FALSE`; default).
`distance`	Compute cosine distance instead? Defaults to `FALSE` (cosine similarity).

Value

A data.table of words, word pairs, and their cosine similarity (cos_sim) or cosine distance (cos_dist).

Download

Download pre-trained word vectors data (.RData): https://psychbruce.github.io/WordVector_RData.pdf

Examples

tab_similarity(demodata, cc("king, queen, man, woman"))
tab_similarity(demodata, cc("king, queen, man, woman"),
               unique=TRUE)

tab_similarity(demodata, cc("Beijing, China, Tokyo, Japan"))
tab_similarity(demodata, cc("Beijing, China, Tokyo, Japan"),
               unique=TRUE)

## only n1 * n2 word pairs across two sets of words
tab_similarity(demodata,
               words1=cc("king, queen, King, Queen"),
               words2=cc("man, woman"))

[Package PsychWordVec version 2023.9 Index]

Tabulate cosine similarity/distance of word pairs.

Description

Usage

Arguments

Value

Download

See Also

Examples