glove {wordsalad} | R Documentation |
Extract word vectors from GloVe word embedding
Description
The calculations are done with the text2vec package.
Usage
glove(
text,
tokenizer = text2vec::space_tokenizer,
dim = 10L,
window = 5L,
min_count = 5L,
n_iter = 10L,
x_max = 10L,
stopwords = character(),
convergence_tol = -1,
threads = 1,
composition = c("tibble", "data.frame", "matrix"),
verbose = FALSE
)
Arguments
text |
Character string. |
tokenizer |
Function, function to perform tokenization. Defaults to text2vec::space_tokenizer. |
dim |
Integer, number of dimension of the resulting word vectors. |
window |
Integer, skip length between words. Defaults to 5. |
min_count |
Integer, number of times a token should appear to be considered in the model. Defaults to 5. |
n_iter |
Integer, number of training iterations. Defaults to 10. |
x_max |
Integer, maximum number of co-occurrences to use in the weighting function. Defaults to 10. |
stopwords |
Character, a vector of stop words to exclude from training. |
convergence_tol |
Numeric, value determining the convergence criteria.
|
threads |
number of CPU threads to use. Defaults to 1. |
composition |
Character, Either "tibble", "matrix", or "data.frame" for the format out the resulting word vectors. |
verbose |
Logical, controls whether progress is reported as operations are executed. |
Value
A tibble, data.frame or matrix containing the token in the first column and word vectors in the remaining columns.
Source
https://nlp.stanford.edu/projects/glove/
References
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation.
Examples
glove(fairy_tales, x_max = 5)