string_clean_alias {stringmagic} | R Documentation |
Cleans a character vector from multiple patterns
Description
Recursively cleans a character vector from several patterns. Quickly handle the
tedious task of data cleaning by taking advantage of the syntax.
You can also apply all sorts of cleaning operations by summoning string_ops()
operations.
Usage
string_clean_alias(
replacement = "",
pipe = " => ",
split = ",[ \n\t]+",
ignore.case = FALSE,
fixed = FALSE,
word = FALSE,
total = FALSE,
single = FALSE,
namespace = NULL
)
string_clean(
x,
...,
replacement = "",
pipe = " => ",
split = ",[ \n\t]+",
ignore.case = FALSE,
fixed = FALSE,
word = FALSE,
total = FALSE,
single = FALSE,
envir = parent.frame(),
namespace = NULL
)
string_replace(
x,
pattern,
replacement = "",
pipe = " => ",
ignore.case = FALSE,
fixed = FALSE,
word = FALSE,
total = FALSE,
single = FALSE,
envir = parent.frame()
)
stclean(
x,
...,
replacement = "",
pipe = " => ",
split = ",[ \n\t]+",
ignore.case = FALSE,
fixed = FALSE,
word = FALSE,
total = FALSE,
single = FALSE,
envir = parent.frame(),
namespace = NULL
)
streplace(
x,
pattern,
replacement = "",
pipe = " => ",
ignore.case = FALSE,
fixed = FALSE,
word = FALSE,
total = FALSE,
single = FALSE,
envir = parent.frame()
)
Arguments
replacement |
Character scalar, default is the empty string. It represents the default
value by which the patterns found in the character strings will be replaced. For example
|
pipe |
Character scalar, default is For example in |
split |
Character scalar, default is Use For example: let's look at |
ignore.case |
Logical scalar, default is |
fixed |
Logical scalar, default is |
word |
Logical scalar, default is |
total |
Logical scalar, default is Example: |
single |
Logical scalar, default is |
namespace |
Character scalar or If so pass the name of your package in this argument so that your function can access
the new |
x |
A character vector. |
... |
Character scalars representing patterns. A pattern is of the form
"flags/pat1, pat2 => replacement". This means that patterns 'pat1' and 'pat2' will be replaced
with the string 'replacement'. By default patterns are comma separated and the replacement comes
after a ' => ' (see args Available regex flags are: 'word' (add word boundaries), 'ignore' (the case), 'fixed' (no regex),
'total', 'single' and 'magic'.
The flag Starting with an '@' leads to operations in |
envir |
Environment in which to evaluate the interpolations if the flag |
pattern |
A character scalar containing a regular expression pattern to be replaced.
You can write the replacement directly in the string after a pipe: ' => ' (see arg. Available regex flags are: 'word' (add word boundaries), 'ignore' (the case), 'fixed' (no regex),
'total', 'single' and 'magic'.
The flag |
Value
The main usage returns a character vector of the same length as the vector in input.
Note, however, that since you can apply arbitrary string_ops()
operations, the length and type
of the final vector may depend on those (if they are used).
Functions
-
string_clean_alias()
: Create astring_clean
alias with custom defaults -
string_replace()
: Simplified version ofstring_clean
-
stclean()
: Alias tostring_clean
-
streplace()
: Alias tostring_replace
Regular expression flags specific to replacement
This function benefits from two specific regex flags: "total" and "single".
"total" replaces the complete string if the pattern is found (remember that the default behavior is to replace just the pattern).
"single" performs a single substitution for each string element and stops there. Only the first match of each string is replaced. Technically we use
base::sub()
internally instead ofbase::gsub()
.
Generic regular expression flags
All stringmagic
functions support generic flags in regular-expression patterns.
The flags are useful to quickly give extra instructions, similarly to usual
regular expression flags.
Here the syntax is "flag1, flag2/pattern". That is: flags are a comma separated list of flag-names
separated from the pattern with a slash (/
). Example: string_which(c("hello...", "world"), "fixed/.")
returns 1
.
Here the flag "fixed" removes the regular expression meaning of "." which would have otherwise meant "any character".
The no-flag verion string_which(c("hello...", "world"), ".")
returns 1:2
.
Alternatively, and this is recommended, you can collate the initials of the flags instead of using a comma separated list. For example: "if/dt[" will apply the flags "ignore" and "fixed" to the pattern "dt[".
The four flags always available are: "ignore", "fixed", "word" and "magic".
"ignore" instructs to ignore the case. Technically, it adds the perl-flag "(?i)" at the beginning of the pattern.
"fixed" removes the regular expression interpretation, so that the characters ".", "$", "^", "[" (among others) lose their special meaning and are treated for what they are: simple characters.
"word" adds word boundaries (
"\\b"
in regex language) to the pattern. Further, the comma (","
) becomes a word separator. Technically, "word/one, two" is treated as "\b(one|two)\b". Example:string_clean("Am I ambushed?", "wi/am")
leads to " I ambushed?" thanks to the flags "ignore" and "word"."magic" allows to interpolate variables inside the pattern before regex interpretation. For example if
letters = "aiou"
thenstring_clean("My great goose!", "magic/[{letters}] => e")
leads to"My greet geese!"
Author(s)
Laurent R. Berge
See Also
String operations: string_is()
, string_get()
, string_clean()
, string_split2df()
.
Chain basic operations with string_ops()
. Clean character vectors efficiently
with string_clean()
.
Use string_vec()
to create simple string vectors.
String interpolation combined with operation chaining: string_magic()
. You can change string_magic
default values with string_magic_alias()
and add custom operations with string_magic_register_fun()
.
Display messages while benefiting from string_magic
interpolation with cat_magic()
and message_magic()
.
Other tools with aliases:
cat_magic_alias()
,
string_magic()
,
string_magic_alias()
,
string_ops_alias()
,
string_vec_alias()
Examples
x = c("hello world ", "it's 5 am....")
# we clean the o's and the points (we use 'fixed' to trigger fixed-search)
string_clean(x, "o", "f/.")
# equivalently
string_clean(x, "fixed/o, .")
# equivalently
string_clean(x, "o, .", fixed = TRUE)
# equivalently
string_clean(x, "o", ".", fixed = TRUE)
#
# chaining operations: example using cars
#
cars = row.names(mtcars)
new = string_clean(cars,
# replace strings containing "Maz" with Mazda
"total/Maz => Mazda",
# replace the word 'Merc' with Mercedes
"wi/merc => Mercedes",
# replace strings containing "Merc" and a digit followed with an 'S'
"t/Merc & \\dS => Mercedes S!",
# put to lower case, remove isolated characters and normalize white spaces
"@lower, ws.isolated")
cbind(cars, new)