preprocess {sbo}R Documentation

Preprocess text corpus

Description

A simple text preprocessing utility.

Usage

preprocess(input, erase = "[^.?!:;'\\w\\s]", lower_case = TRUE)

Arguments

input

a character vector.

erase

a length one character vector. Regular expression matching parts of text to be erased from input. The default removes anything not alphanumeric, white space, apostrophes or punctuation characters (i.e. ".?!:;").

lower_case

a length one logical vector. If TRUE, puts everything to lower case.

Value

a character vector containing the processed output.

Author(s)

Valerio Gherardi

Examples

preprocess("Hi @ there! I'm using `sbo`.")

[Package sbo version 0.5.0 Index]