Modern Text Mining Framework for R


[Up] [Top]

Documentation for package ‘text2vec’ version 0.6.4

Help Pages

text2vec-package text2vec
as.lda_c Converts document-term matrix sparse matrix to 'lda_c' format
BNS BNS
char_tokenizer Simple tokenization functions for string splitting
check_analogy_accuracy Checks accuracy of word embeddings on the analogy task
coherence Coherence metrics for topic models
Collocations Collocations model.
combine_vocabularies Combines multiple vocabularies into one
create_dtm Document-term matrix construction
create_dtm.itoken Document-term matrix construction
create_dtm.itoken_parallel Document-term matrix construction
create_tcm Term-co-occurence matrix construction
create_tcm.itoken Term-co-occurence matrix construction
create_tcm.itoken_parallel Term-co-occurence matrix construction
create_vocabulary Creates a vocabulary of unique terms
create_vocabulary.character Creates a vocabulary of unique terms
create_vocabulary.itoken Creates a vocabulary of unique terms
create_vocabulary.itoken_parallel Creates a vocabulary of unique terms
dist2 Pairwise Distance Matrix Computation
distances Pairwise Distance Matrix Computation
GlobalVectors re-export rsparse::GloVe
GloVe re-export rsparse::GloVe
hash_vectorizer Vocabulary and hash vectorizers
idir Creates iterator over text files from the disk
ifiles Creates iterator over text files from the disk
ifiles_parallel Creates iterator over text files from the disk
itoken Iterators (and parallel iterators) over input objects
itoken.character Iterators (and parallel iterators) over input objects
itoken.iterator Iterators (and parallel iterators) over input objects
itoken.list Iterators (and parallel iterators) over input objects
itoken_parallel Iterators (and parallel iterators) over input objects
itoken_parallel.character Iterators (and parallel iterators) over input objects
itoken_parallel.iterator Iterators (and parallel iterators) over input objects
itoken_parallel.list Iterators (and parallel iterators) over input objects
jsPCA_robust (numerically robust) Dimension reduction via Jensen-Shannon Divergence & Principal Components
LatentDirichletAllocation Creates Latent Dirichlet Allocation model.
LatentSemanticAnalysis Latent Semantic Analysis model
LDA Creates Latent Dirichlet Allocation model.
LSA Latent Semantic Analysis model
movie_review IMDB movie reviews
normalize Matrix normalization
pdist2 Pairwise Distance Matrix Computation
perplexity Perplexity of a topic model
postag_lemma_tokenizer Simple tokenization functions for string splitting
prepare_analogy_questions Prepares list of analogy questions
print.text2vec_vocabulary Printing Vocabulary
prune_vocabulary Prune vocabulary
psim2 Pairwise Similarity Matrix Computation
RelaxedWordMoversDistance Creates Relaxed Word Movers Distance (RWMD) model
RWMD Creates Relaxed Word Movers Distance (RWMD) model
sim2 Pairwise Similarity Matrix Computation
similarities Pairwise Similarity Matrix Computation
space_tokenizer Simple tokenization functions for string splitting
split_into Split a vector for parallel processing
text2vec text2vec
TfIdf TfIdf
tokenizers Simple tokenization functions for string splitting
vectorizers Vocabulary and hash vectorizers
vocabulary Creates a vocabulary of unique terms
vocab_vectorizer Vocabulary and hash vectorizers
word_tokenizer Simple tokenization functions for string splitting