R: Extract word vectors from GloVe word embedding

glove {wordsalad}

R Documentation

Extract word vectors from GloVe word embedding

Description

The calculations are done with the text2vec package.

Usage

glove(
  text,
  tokenizer = text2vec::space_tokenizer,
  dim = 10L,
  window = 5L,
  min_count = 5L,
  n_iter = 10L,
  x_max = 10L,
  stopwords = character(),
  convergence_tol = -1,
  threads = 1,
  composition = c("tibble", "data.frame", "matrix"),
  verbose = FALSE
)

Arguments

`text`	Character string.
`tokenizer`	Function, function to perform tokenization. Defaults to text2vec::space_tokenizer.
`dim`	Integer, number of dimension of the resulting word vectors.
`window`	Integer, skip length between words. Defaults to 5.
`min_count`	Integer, number of times a token should appear to be considered in the model. Defaults to 5.
`n_iter`	Integer, number of training iterations. Defaults to 10.
`x_max`	Integer, maximum number of co-occurrences to use in the weighting function. Defaults to 10.
`stopwords`	Character, a vector of stop words to exclude from training.
`convergence_tol`	Numeric, value determining the convergence criteria. `numeric = -1` defines early stopping strategy. Stop fitting when one of two following conditions will be satisfied: (a) passed all iterations (b) `cost_previous_iter / cost_current_iter - 1 < convergence_tol`. Defaults to -1.
`threads`	number of CPU threads to use. Defaults to 1.
`composition`	Character, Either "tibble", "matrix", or "data.frame" for the format out the resulting word vectors.
`verbose`	Logical, controls whether progress is reported as operations are executed.

Value

A tibble, data.frame or matrix containing the token in the first column and word vectors in the remaining columns.

Source

https://nlp.stanford.edu/projects/glove/

References

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation.

Examples

glove(fairy_tales, x_max = 5)

[Package wordsalad version 0.2.0 Index]