| preprocess {ngram} | R Documentation |
Basic Text Preprocessor
Description
A simple text preprocessor for use with the ngram() function.
Usage
preprocess(
x,
case = "lower",
remove.punct = FALSE,
remove.numbers = FALSE,
fix.spacing = TRUE
)
Arguments
x |
Input text. |
case |
Option to change the case of the text. Value should be "upper", "lower", or NULL (no change). |
remove.punct |
Logical; should punctuation be removed? |
remove.numbers |
Logical; should numbers be removed? |
fix.spacing |
Logical; should multi/trailing spaces be collapsed/removed. |
Details
The input text x must already be in the correct form for
ngram(), i.e., a single string (character vector of length 1).
The case argument can take 3 possible values: NULL, in which
case nothing is done, or lower or upper, wherein the case of
the input text will be made lower/upper case, repesctively.
Value
concat() returns
Examples
library(ngram)
x = "Watch out for snakes! 111"
preprocess(x)
preprocess(x, remove.punct=TRUE, remove.numbers=TRUE)
[Package ngram version 3.2.3 Index]