data_wordvec_subset {PsychWordVec} | R Documentation |
Extract a subset of word vectors data (with S3 methods).
Description
Extract a subset of word vectors data (with S3 methods).
You may specify either a wordvec
or embed
loaded by data_wordvec_load
)
or an .RData file transformed by data_transform
).
Usage
data_wordvec_subset(
x,
words = NULL,
pattern = NULL,
as = c("wordvec", "embed"),
file.save,
compress = "bzip2",
compress.level = 9,
verbose = TRUE
)
## S3 method for class 'wordvec'
subset(x, ...)
## S3 method for class 'embed'
subset(x, ...)
Arguments
x |
Can be:
|
words |
[Option 1] Character string(s). |
pattern |
[Option 2] Regular expression (see |
as |
Reshape to
|
file.save |
File name of to-be-saved R data (must be .RData). |
compress |
Compression method for the saved file. Defaults to Options include:
|
compress.level |
Compression level from |
verbose |
Print information to the console? Defaults to |
... |
Parameters passed to |
Value
A subset of wordvec
or embed
of valid (available) words.
Download
Download pre-trained word vectors data (.RData
):
https://psychbruce.github.io/WordVector_RData.pdf
See Also
Examples
## directly use `embed[i, j]` (3x faster than `wordvec`):
d = as_embed(demodata)
d[1:5]
d["people"]
d[c("China", "Japan", "Korea")]
## specify `x` as a `wordvec` or `embed` object:
subset(demodata, c("China", "Japan", "Korea"))
subset(d, pattern="^Chi")
## specify `x` and `pattern`, and save with `file.save`:
subset(demodata, pattern="Chin[ae]|Japan|Korea",
file.save="subset.RData")
## load the subset:
d.subset = load_wordvec("subset.RData")
d.subset
## specify `x` as an .RData file and save with `file.save`:
data_wordvec_subset("subset.RData",
words=c("China", "Chinese"),
file.save="new.subset.RData")
d.new.subset = load_embed("new.subset.RData")
d.new.subset
unlink("subset.RData") # delete file for code check
unlink("new.subset.RData") # delete file for code check