data_transform {PsychWordVec} | R Documentation |
Transform plain text of word vectors into
wordvec
(data.table) or embed
(matrix),
saved in a compressed ".RData" file.
Description
Transform plain text of word vectors into
wordvec
(data.table) or embed
(matrix),
saved in a compressed ".RData" file.
Speed: In total (preprocess + compress + save),
it can process about 30000 words/min
with the slowest settings (compress="xz"
, compress.level=9
)
on a modern computer (HP ProBook 450, Windows 11, Intel i7-1165G7 CPU, 32GB RAM).
Usage
data_transform(
file.load,
file.save,
as = c("wordvec", "embed"),
sep = " ",
header = "auto",
encoding = "auto",
compress = "bzip2",
compress.level = 9,
verbose = TRUE
)
Arguments
file.load |
File name of raw text (must be plain text). Data must be in this format (values separated by cat 0.001 0.002 0.003 0.004 0.005 ... 0.300 dog 0.301 0.302 0.303 0.304 0.305 ... 0.600 |
file.save |
File name of to-be-saved R data (must be .RData). |
as |
Transform the text to which R object?
|
sep |
Column separator. Defaults to |
header |
Is the 1st row a header (e.g., meta-information such as "2000000 300")?
Defaults to |
encoding |
File encoding. Defaults to |
compress |
Compression method for the saved file. Defaults to Options include:
|
compress.level |
Compression level from |
verbose |
Print information to the console? Defaults to |
Value
A wordvec
(data.table) or embed
(matrix).
Download
Download pre-trained word vectors data (.RData
):
https://psychbruce.github.io/WordVector_RData.pdf
See Also
Examples
## Not run:
# please first manually download plain text data of word vectors
# e.g., from: https://fasttext.cc/docs/en/crawl-vectors.html
# the text file must be on your disk
# the following code cannot run unless you have the file
library(bruceR)
set.wd()
data_transform(file.load="cc.zh.300.vec", # plain text file
file.save="cc.zh.300.vec.RData", # RData file
header=TRUE, compress="xz") # of minimal size
## End(Not run)