| as_embed {PsychWordVec} | R Documentation |
Word vectors data class: wordvec and embed.
Description
PsychWordVec uses two types of word vectors data:
wordvec (data.table, with two variables word and vec)
and embed (matrix, with dimensions as columns and words as row names).
Note that matrix operation makes embed much faster than wordvec.
Users are suggested to reshape data to embed before using the other functions.
Usage
as_embed(x, normalize = FALSE)
as_wordvec(x, normalize = FALSE)
## S3 method for class 'embed'
x[i, j]
pattern(pattern)
Arguments
x |
Object to be reshaped. See examples. |
normalize |
Normalize all word vectors to unit length?
Defaults to |
i, j |
Row ( |
pattern |
Regular expression to be used in |
Value
A wordvec (data.table) or embed (matrix).
Functions
-
as_embed(): Fromwordvec(data.table) toembed(matrix). -
as_wordvec(): Fromembed(matrix) towordvec(data.table).
Download
Download pre-trained word vectors data (.RData):
https://psychbruce.github.io/WordVector_RData.pdf
See Also
Examples
dt = head(demodata, 10)
str(dt)
embed = as_embed(dt, normalize=TRUE)
embed
str(embed)
wordvec = as_wordvec(embed, normalize=TRUE)
wordvec
str(wordvec)
df = data.frame(token=LETTERS, D1=1:26/10000, D2=26:1/10000)
as_embed(df)
as_wordvec(df)
dd = rbind(dt[1:5], dt[1:5])
dd # duplicate words
unique(dd)
dm = as_embed(dd)
dm # duplicate words
unique(dm)
# more examples for extracting a subset using `x[i, j]`
# (3x faster than `wordvec`)
embed = as_embed(demodata)
embed[1]
embed[1:5]
embed["for"]
embed[pattern("^for.{0,2}$")]
embed[cc("for, in, on, xxx")]
embed[cc("for, in, on, xxx"), 5:10]
embed[1:5, 5:10]
embed[, 5:10]
embed[3, 4]
embed["that", 4]