R: Word vectors data class: 'wordvec' and 'embed'.

as_embed {PsychWordVec}

R Documentation

Word vectors data class: `wordvec` and `embed`.

Description

PsychWordVec uses two types of word vectors data: wordvec (data.table, with two variables word and vec) and embed (matrix, with dimensions as columns and words as row names). Note that matrix operation makes embed much faster than wordvec. Users are suggested to reshape data to embed before using the other functions.

Usage

as_embed(x, normalize = FALSE)

as_wordvec(x, normalize = FALSE)

## S3 method for class 'embed'
x[i, j]

pattern(pattern)

Arguments

`x`	Object to be reshaped. See examples.
`normalize`	Normalize all word vectors to unit length? Defaults to `FALSE`. See `normalize`.
`i`, `j`	Row (`i`) and column (`j`) filter to be used in `embed[i, j]`.
`pattern`	Regular expression to be used in `embed[pattern("...")]`.

Value

A wordvec (data.table) or embed (matrix).

Functions

as_embed(): From wordvec (data.table) to embed (matrix).
as_wordvec(): From embed (matrix) to wordvec (data.table).

Download

Download pre-trained word vectors data (.RData): https://psychbruce.github.io/WordVector_RData.pdf

Examples

dt = head(demodata, 10)
str(dt)

embed = as_embed(dt, normalize=TRUE)
embed
str(embed)

wordvec = as_wordvec(embed, normalize=TRUE)
wordvec
str(wordvec)

df = data.frame(token=LETTERS, D1=1:26/10000, D2=26:1/10000)
as_embed(df)
as_wordvec(df)

dd = rbind(dt[1:5], dt[1:5])
dd  # duplicate words
unique(dd)

dm = as_embed(dd)
dm  # duplicate words
unique(dm)

# more examples for extracting a subset using `x[i, j]`
# (3x faster than `wordvec`)
embed = as_embed(demodata)
embed[1]
embed[1:5]
embed["for"]
embed[pattern("^for.{0,2}$")]
embed[cc("for, in, on, xxx")]
embed[cc("for, in, on, xxx"), 5:10]
embed[1:5, 5:10]
embed[, 5:10]
embed[3, 4]
embed["that", 4]

[Package PsychWordVec version 2023.9 Index]

Word vectors data class: wordvec and embed.