select.lspace {lingmatch} | R Documentation |
Select Latent Semantic Spaces
Description
Retrieve information and links to latent semantic spaces (sets of word vectors/embeddings) available at osf.io/489he, and optionally download their term mappings (osf.io/xr7jv).
Usage
select.lspace(query = NULL, dir = getOption("lingmatch.lspace.dir"),
terms = NULL, get.map = FALSE, check.md5 = TRUE, mode = "wb")
Arguments
query |
A character used to select spaces, based on names or other features.
If length is over 1, |
dir |
Path to a directory containing |
terms |
A character vector of terms to search for in the downloaded term map, to calculate
coverage of spaces, or select by coverage if |
get.map |
Logical; if |
check.md5 |
Logical; if |
mode |
Passed to |
Value
A list with varying entries:
-
info
: The version of osf.io/9yzca stored internally; adata.frame
with spaces as row names, and information about each space in columns:-
terms
: number of terms in the space -
corpus
: corpus(es) on which the space was trained -
model
: model from which the space was trained -
dimensions
: number of dimensions in the model (columns of the space) -
model_info
: some parameter details about the model -
original_max
: maximum value used to normalize the space; the original space would be(vectors *
original_max) /
100
-
osf_dat
: OSF id for the.dat
files; the URL would be https://osf.io/osf_dat
-
osf_terms
: OSF id for the_terms.txt
files; the URL would be https://osf.io/osf_terms
-
wiki
: link to the wiki for the space -
downloaded
: path to the.dat
file if downloaded, and''
otherwise.
-
-
selected
: A subset ofinfo
selected byquery
. -
term_map
: Ifget.map
isTRUE
orlma_term_map.rda
is found indir
, a copy of osf.io/xr7jv, which has space names as column names, terms as row names, and indices as values, with 0 indicating the term is not present in the associated space.
See Also
Other Latent Semantic Space functions:
download.lspace()
,
lma_lspace()
,
standardize.lspace()
Examples
# just retrieve information about available spaces
spaces <- select.lspace()
spaces$info[1:10, c("terms", "dimensions", "original_max")]
# retrieve all spaces that used word2vec
w2v_spaces <- select.lspace("word2vec")$selected
w2v_spaces[, c("terms", "dimensions", "original_max")]
## Not run:
# select spaces by terms
select.lspace(terms = c(
"part-time", "i/o", "'cause", "brexit", "debuffs"
))$selected[, c("terms", "coverage")]
## End(Not run)