cleansing_corpus {labourR} | R Documentation |
Cleansing Corpus
Description
The function performs text cleansing by removing escape characters, non alphanumeric, long-words, excess space, and turns all letters to lower case.
Usage
cleansing_corpus(
text,
escape_chars = TRUE,
nonalphanum = TRUE,
longwords = TRUE,
whitespace = TRUE,
tolower = TRUE
)
Arguments
text |
Character vector of free text to be cleansed. |
escape_chars |
If TRUE, removes escape characters for |
nonalphanum |
If TRUE, removes non-alphanumeric characters. |
longwords |
If TRUE, removes words with more than 35 characters. |
whitespace |
If TRUE, removes excess whitespace. |
tolower |
If TRUE, turns letters to lower. |
Value
A character vector of the cleansed text.
Examples
txt <- "It has roots in a piece of classical Latin literature from 45 BC"
cleansing_corpus(txt)
[Package labourR version 1.0.0 Index]