Modern Text Mining Framework for R

Documentation for package ‘text2vec’ version 0.6.4

Help Pages

text2vec-package	text2vec
as.lda_c	Converts document-term matrix sparse matrix to 'lda_c' format
BNS	BNS
char_tokenizer	Simple tokenization functions for string splitting
check_analogy_accuracy	Checks accuracy of word embeddings on the analogy task
coherence	Coherence metrics for topic models
Collocations	Collocations model.
combine_vocabularies	Combines multiple vocabularies into one
create_dtm	Document-term matrix construction
create_dtm.itoken	Document-term matrix construction
create_dtm.itoken_parallel	Document-term matrix construction
create_tcm	Term-co-occurence matrix construction
create_tcm.itoken	Term-co-occurence matrix construction
create_tcm.itoken_parallel	Term-co-occurence matrix construction
create_vocabulary	Creates a vocabulary of unique terms
create_vocabulary.character	Creates a vocabulary of unique terms
create_vocabulary.itoken	Creates a vocabulary of unique terms
create_vocabulary.itoken_parallel	Creates a vocabulary of unique terms
dist2	Pairwise Distance Matrix Computation
distances	Pairwise Distance Matrix Computation
GlobalVectors	re-export rsparse::GloVe
GloVe	re-export rsparse::GloVe
hash_vectorizer	Vocabulary and hash vectorizers
idir	Creates iterator over text files from the disk
ifiles	Creates iterator over text files from the disk
ifiles_parallel	Creates iterator over text files from the disk
itoken	Iterators (and parallel iterators) over input objects
itoken.character	Iterators (and parallel iterators) over input objects
itoken.iterator	Iterators (and parallel iterators) over input objects
itoken.list	Iterators (and parallel iterators) over input objects
itoken_parallel	Iterators (and parallel iterators) over input objects
itoken_parallel.character	Iterators (and parallel iterators) over input objects
itoken_parallel.iterator	Iterators (and parallel iterators) over input objects
itoken_parallel.list	Iterators (and parallel iterators) over input objects
jsPCA_robust	(numerically robust) Dimension reduction via Jensen-Shannon Divergence & Principal Components
LatentDirichletAllocation	Creates Latent Dirichlet Allocation model.
LatentSemanticAnalysis	Latent Semantic Analysis model
LDA	Creates Latent Dirichlet Allocation model.
LSA	Latent Semantic Analysis model
movie_review	IMDB movie reviews
normalize	Matrix normalization
pdist2	Pairwise Distance Matrix Computation
perplexity	Perplexity of a topic model
postag_lemma_tokenizer	Simple tokenization functions for string splitting
prepare_analogy_questions	Prepares list of analogy questions
print.text2vec_vocabulary	Printing Vocabulary
prune_vocabulary	Prune vocabulary
psim2	Pairwise Similarity Matrix Computation
RelaxedWordMoversDistance	Creates Relaxed Word Movers Distance (RWMD) model
RWMD	Creates Relaxed Word Movers Distance (RWMD) model
sim2	Pairwise Similarity Matrix Computation
similarities	Pairwise Similarity Matrix Computation
space_tokenizer	Simple tokenization functions for string splitting
split_into	Split a vector for parallel processing
text2vec	text2vec
TfIdf	TfIdf
tokenizers	Simple tokenization functions for string splitting
vectorizers	Vocabulary and hash vectorizers
vocabulary	Creates a vocabulary of unique terms
vocab_vectorizer	Vocabulary and hash vectorizers
word_tokenizer	Simple tokenization functions for string splitting