vocab_builder {text2map} | R Documentation |
A fast unigram vocabulary builder
Description
A streamlined function to take raw texts from a column of a data.frame and
produce a list of all the unique tokens. Tokenizes by the fixed,
single whitespace, and then extracts the unique tokens. This can be used as
input to dtm_builder()
to standardize the vocabulary (i.e. the columns)
across multiple DTMs. Prior to building the vocabulary, texts should have
whitespace trimmed, if desired, punctuation removed and terms lowercased.
Usage
vocab_builder(data, text)
Arguments
data |
Data.frame with one column of texts |
text |
Name of the column with documents' text |
Value
returns a list of unique terms in a corpus
Author(s)
Dustin Stoltz
[Package text2map version 0.2.0 Index]