read.dsm.matrix {wordspace} | R Documentation |
Load DSM Matrix from File (wordspace)
Description
This function loads a DSM matrix from a disk file in the specified format (see section sQuote(Formats) for details).
Usage
read.dsm.matrix(file, format = c("word2vec"),
encoding = "UTF-8", batchsize = 1e6, verbose=FALSE)
Arguments
file |
either a character string naming a file or a |
format |
input file format (see section sQuote(Formats)). The input file format cannot be guessed automatically. |
encoding |
character encoding of the input file (ignored if |
batchsize |
for certain input formats, the matrix is read in batches of |
verbose |
if |
Details
In order to read text formats from a compressed file, pass a gzfile
, bzfile
or xzfile
connection with appropriate encoding
in the argument file
. Make sure not to open the connection before passing it to read.dsm.matrix
.
Formats
Currently, the only supported file format is word2vec
.
word2vec
-
This widely used text format for word embeddings is only suitable for a dense matrix. Row labels must be unique and may not contain whitespace. Values are usually rounded to a few decimal digits in order to keep file size manageable.
The first line of the file lists the matrix dimensions (rows, columns) separated by a single blank. It is followed by one text line for each matrix row, starting with the row label. The label and are cells are separated by single blanks, so row labels cannot contain whitespace.
Author(s)
Stephanie Evert (https://purl.org/stephanie.evert)
See Also
write.dsm.matrix
, read.dsm.triplet
, read.dsm.ucs
Examples
fn <- system.file("extdata", "word2vec_hiero.txt", package="wordspace")
read.dsm.matrix(fn, format="word2vec")