as_embed {PsychWordVec} | R Documentation |
Word vectors data class: wordvec
and embed
.
Description
PsychWordVec
uses two types of word vectors data:
wordvec
(data.table, with two variables word
and vec
)
and embed
(matrix, with dimensions as columns and words as row names).
Note that matrix operation makes embed
much faster than wordvec
.
Users are suggested to reshape data to embed
before using the other functions.
Usage
as_embed(x, normalize = FALSE)
as_wordvec(x, normalize = FALSE)
## S3 method for class 'embed'
x[i, j]
pattern(pattern)
Arguments
x |
Object to be reshaped. See examples. |
normalize |
Normalize all word vectors to unit length?
Defaults to |
i , j |
Row ( |
pattern |
Regular expression to be used in |
Value
A wordvec
(data.table) or embed
(matrix).
Functions
-
as_embed()
: Fromwordvec
(data.table) toembed
(matrix). -
as_wordvec()
: Fromembed
(matrix) towordvec
(data.table).
Download
Download pre-trained word vectors data (.RData
):
https://psychbruce.github.io/WordVector_RData.pdf
See Also
Examples
dt = head(demodata, 10)
str(dt)
embed = as_embed(dt, normalize=TRUE)
embed
str(embed)
wordvec = as_wordvec(embed, normalize=TRUE)
wordvec
str(wordvec)
df = data.frame(token=LETTERS, D1=1:26/10000, D2=26:1/10000)
as_embed(df)
as_wordvec(df)
dd = rbind(dt[1:5], dt[1:5])
dd # duplicate words
unique(dd)
dm = as_embed(dd)
dm # duplicate words
unique(dm)
# more examples for extracting a subset using `x[i, j]`
# (3x faster than `wordvec`)
embed = as_embed(demodata)
embed[1]
embed[1:5]
embed["for"]
embed[pattern("^for.{0,2}$")]
embed[cc("for, in, on, xxx")]
embed[cc("for, in, on, xxx"), 5:10]
embed[1:5, 5:10]
embed[, 5:10]
embed[3, 4]
embed["that", 4]