vectorize.words {fdm2id} | R Documentation |
Word vectorization
Description
Vectorize words from a corpus of documents.
Usage
vectorize.words(
corpus = NULL,
ndim = 50,
maxwords = NULL,
mincount = 5,
minphrasecount = NULL,
window = 5,
maxcooc = 10,
maxiter = 10,
epsilon = 0.01,
lang = "en",
stopwords = lang,
...
)
Arguments
corpus |
The corpus of documents (a vector of characters). |
ndim |
The number of dimensions of the vector space. |
maxwords |
The maximum number of words. |
mincount |
Minimum word count to be considered as frequent. |
minphrasecount |
Minimum collocation of words count to be considered as frequent. |
window |
Window for term-co-occurence matrix construction. |
maxcooc |
Maximum number of co-occurrences to use in the weighting function. |
maxiter |
The maximum number of iteration to fit the GloVe model. |
epsilon |
Defines early stopping strategy when fit the GloVe model. |
lang |
The language of the documents (NULL if no stemming). |
stopwords |
Stopwords, or the language of the documents. NULL if stop words should not be removed. |
... |
Other parameters. |
Value
The vectorized words.
See Also
query.words
, stopwords
, vectorizers
Examples
## Not run:
text = loadtext ("http://mattmahoney.net/dc/text8.zip")
words = vectorize.words (text, minphrasecount = 50)
query.words (words, origin = "paris", sub = "france", add = "germany")
query.words (words, origin = "berlin", sub = "germany", add = "france")
query.words (words, origin = "new_zealand")
## End(Not run)
[Package fdm2id version 0.9.9 Index]