| cleanText {lares} | R Documentation |
Clean text strings automatically
Description
cleanText: Clean character strings automatically. Options to keep ASCII
characters only, keep certain characters, lower caps, title format, are available.
cleanNames: Resulting names are unique and consist only of the _
character, numbers, and ASCII letters. Capitalization preferences can be
specified using the lower parameter.
Usage
cleanText(
text,
spaces = TRUE,
keep = "",
lower = TRUE,
ascii = TRUE,
title = FALSE
)
cleanNames(df, num = "x", keep = "_", ...)
Arguments
text |
Character Vector |
spaces |
Boolean. Keep spaces? If character input, spaces will be transformed into passed argument. |
keep |
Character. String (concatenated or as vector) with all characters that are accepted and should be kept, in addition to alphanumeric. |
lower |
Boolean. Transform all to lower case? |
ascii |
Boolean. Only ASCII characters? |
title |
Boolean. Transform to title format (upper case on first letters). |
df |
data.frame/tibble. |
num |
Add character before only-numeric names. |
... |
Additional parameters passed to |
Details
Inspired by janitor::clean_names.
Value
Character vector with transformed strings.
data.frame/tibble with transformed column names.
See Also
Other Data Wrangling:
balance_data(),
categ_reducer(),
date_cuts(),
date_feats(),
file_name(),
formatHTML(),
holidays(),
impute(),
left(),
normalize(),
num_abbr(),
ohe_commas(),
ohse(),
quants(),
removenacols(),
replaceall(),
replacefactor(),
textFeats(),
textTokenizer(),
vector2text(),
year_month(),
zerovar()
Other Text Mining:
ngrams(),
remove_stopwords(),
replaceall(),
sentimentBreakdown(),
textCloud(),
textFeats(),
textTokenizer(),
topics_rake()
Examples
cleanText("Bernardo Lares 123")
cleanText("Bèrnärdo LáreS 123", lower = FALSE)
cleanText("Bernardo Lare$", spaces = ".", ascii = FALSE)
cleanText("\\@®ì÷å %ñS ..-X", spaces = FALSE)
cleanText(c("maría", "€", "núñez_a."), title = TRUE)
cleanText("29_Feb-92()#", keep = c("#", "_"), spaces = FALSE)
# For a data.frame directly:
df <- dft[1:5, 1:6] # Dummy data
colnames(df) <- c("ID.", "34", "x_2", "Num 123", "Nòn-äscì", " white Spaces ")
print(df)
cleanNames(df)
cleanNames(df, lower = FALSE)