| select.lspace {lingmatch} | R Documentation |
Select Latent Semantic Spaces
Description
Retrieve information and links to latent semantic spaces (sets of word vectors/embeddings) available at osf.io/489he, and optionally download their term mappings (osf.io/xr7jv).
Usage
select.lspace(query = NULL, dir = getOption("lingmatch.lspace.dir"),
terms = NULL, get.map = FALSE, check.md5 = TRUE, mode = "wb")
Arguments
query |
A character used to select spaces, based on names or other features.
If length is over 1, |
dir |
Path to a directory containing |
terms |
A character vector of terms to search for in the downloaded term map, to calculate
coverage of spaces, or select by coverage if |
get.map |
Logical; if |
check.md5 |
Logical; if |
mode |
Passed to |
Value
A list with varying entries:
-
info: The version of osf.io/9yzca stored internally; adata.framewith spaces as row names, and information about each space in columns:-
terms: number of terms in the space -
corpus: corpus(es) on which the space was trained -
model: model from which the space was trained -
dimensions: number of dimensions in the model (columns of the space) -
model_info: some parameter details about the model -
original_max: maximum value used to normalize the space; the original space would be(vectors *original_max) /100 -
osf_dat: OSF id for the.datfiles; the URL would be https://osf.io/osf_dat -
osf_terms: OSF id for the_terms.txtfiles; the URL would be https://osf.io/osf_terms -
wiki: link to the wiki for the space -
downloaded: path to the.datfile if downloaded, and''otherwise.
-
-
selected: A subset ofinfoselected byquery. -
term_map: Ifget.mapisTRUEorlma_term_map.rdais found indir, a copy of osf.io/xr7jv, which has space names as column names, terms as row names, and indices as values, with 0 indicating the term is not present in the associated space.
See Also
Other Latent Semantic Space functions:
download.lspace(),
lma_lspace(),
standardize.lspace()
Examples
# just retrieve information about available spaces
spaces <- select.lspace()
spaces$info[1:10, c("terms", "dimensions", "original_max")]
# retrieve all spaces that used word2vec
w2v_spaces <- select.lspace("word2vec")$selected
w2v_spaces[, c("terms", "dimensions", "original_max")]
## Not run:
# select spaces by terms
select.lspace(terms = c(
"part-time", "i/o", "'cause", "brexit", "debuffs"
))$selected[, c("terms", "coverage")]
## End(Not run)