as_embed {PsychWordVec}R Documentation

Word vectors data class: wordvec and embed.

Description

PsychWordVec uses two types of word vectors data: wordvec (data.table, with two variables word and vec) and embed (matrix, with dimensions as columns and words as row names). Note that matrix operation makes embed much faster than wordvec. Users are suggested to reshape data to embed before using the other functions.

Usage

as_embed(x, normalize = FALSE)

as_wordvec(x, normalize = FALSE)

## S3 method for class 'embed'
x[i, j]

pattern(pattern)

Arguments

x

Object to be reshaped. See examples.

normalize

Normalize all word vectors to unit length? Defaults to FALSE. See normalize.

i, j

Row (i) and column (j) filter to be used in embed[i, j].

pattern

Regular expression to be used in embed[pattern("...")].

Value

A wordvec (data.table) or embed (matrix).

Functions

Download

Download pre-trained word vectors data (.RData): https://psychbruce.github.io/WordVector_RData.pdf

See Also

load_wordvec / load_embed

normalize

data_transform

data_wordvec_subset

Examples

dt = head(demodata, 10)
str(dt)

embed = as_embed(dt, normalize=TRUE)
embed
str(embed)

wordvec = as_wordvec(embed, normalize=TRUE)
wordvec
str(wordvec)

df = data.frame(token=LETTERS, D1=1:26/10000, D2=26:1/10000)
as_embed(df)
as_wordvec(df)

dd = rbind(dt[1:5], dt[1:5])
dd  # duplicate words
unique(dd)

dm = as_embed(dd)
dm  # duplicate words
unique(dm)

# more examples for extracting a subset using `x[i, j]`
# (3x faster than `wordvec`)
embed = as_embed(demodata)
embed[1]
embed[1:5]
embed["for"]
embed[pattern("^for.{0,2}$")]
embed[cc("for, in, on, xxx")]
embed[cc("for, in, on, xxx"), 5:10]
embed[1:5, 5:10]
embed[, 5:10]
embed[3, 4]
embed["that", 4]


[Package PsychWordVec version 2023.9 Index]