txt_clean_word2vec {word2vec} | R Documentation |
Text cleaning specific for input to word2vec
Description
Standardise text by
Conversion of text from UTF-8 to ASCII
Keeping only alphanumeric characters: letters and numbers
Removing multiple spaces
Removing leading/trailing spaces
Performing lowercasing
Usage
txt_clean_word2vec(x, ascii = TRUE, alpha = TRUE, tolower = TRUE, trim = TRUE)
Arguments
x |
a character vector in UTF-8 encoding |
ascii |
logical indicating to use |
alpha |
logical indicating to keep only alphanumeric characters. Defaults to TRUE. |
tolower |
logical indicating to lowercase |
trim |
logical indicating to trim leading/trailing white space. Defaults to TRUE. |
Value
a character vector of the same length as x
which is standardised by converting the encoding to ascii, lowercasing and
keeping only alphanumeric elements
Examples
x <- c(" Just some.texts, ok?", "123.456 and\tsome MORE! ")
txt_clean_word2vec(x)
[Package word2vec version 0.4.0 Index]