| vectorizers {text2vec} | R Documentation | 
Vocabulary and hash vectorizers
Description
This function creates an object (closure) which defines on how to transform list of tokens into vector space - i.e. how to map words to indices. It supposed to be used only as argument to create_dtm, create_tcm, create_vocabulary.
Usage
vocab_vectorizer(vocabulary)
hash_vectorizer(hash_size = 2^18, ngram = c(1L, 1L),
  signed_hash = FALSE)
Arguments
| vocabulary | 
 | 
| hash_size | 
 | 
| ngram | 
 | 
| signed_hash | 
 | 
Value
A vectorizer object (closure).
See Also
create_dtm create_tcm create_vocabulary
Examples
data("movie_review")
N = 100
vectorizer = hash_vectorizer(2 ^ 18, c(1L, 2L))
it = itoken(movie_review$review[1:N], preprocess_function = tolower,
             tokenizer = word_tokenizer, n_chunks = 10)
hash_dtm = create_dtm(it, vectorizer)
it = itoken(movie_review$review[1:N], preprocess_function = tolower,
             tokenizer = word_tokenizer, n_chunks = 10)
v = create_vocabulary(it, c(1L, 1L) )
vectorizer = vocab_vectorizer(v)
it = itoken(movie_review$review[1:N], preprocess_function = tolower,
             tokenizer = word_tokenizer, n_chunks = 10)
dtm = create_dtm(it, vectorizer)