rm_non_ascii {qdapRegex} | R Documentation |
Remove/Replace/Extract Non-ASCII
Description
Remove/replace/extract non-ASCII substring from a string. This is the template used by
other qdapRegex rm_XXX
functions.
Usage
rm_non_ascii(
text.var,
trim = !extract,
clean = TRUE,
pattern = "@rm_non_ascii",
replacement = "",
extract = FALSE,
dictionary = getOption("regex.library"),
ascii.out = TRUE,
...
)
ex_non_ascii(
text.var,
trim = !extract,
clean = TRUE,
pattern = "@rm_non_ascii",
replacement = "",
extract = TRUE,
dictionary = getOption("regex.library"),
ascii.out = TRUE,
...
)
Arguments
text.var |
The text variable. |
trim |
logical. If |
clean |
trim logical. If |
pattern |
A character string containing a regular expression (or
character string for |
replacement |
Replacement for matched |
extract |
logical. If |
dictionary |
A dictionary of canned regular expressions to search within
if |
ascii.out |
logical. If |
... |
ignored. |
Value
Returns a character string with "all non-ascii" removed.
Note
MacOS 14, Sonoma (and likely all versions afterward), has a different implementation of iconv which may not result in expected results.
Warning
iconv
is used within rm_non_ascii
.
iconv
's behavior across operating systems may not be
consistent.
Author(s)
stackoverflow's MrFlick, hwnd, and Tyler Rinker <tyler.rinker@gmail.com>.
See Also
Other rm_ functions:
rm_abbreviation()
,
rm_between()
,
rm_bracket()
,
rm_caps_phrase()
,
rm_caps()
,
rm_citation_tex()
,
rm_citation()
,
rm_city_state_zip()
,
rm_city_state()
,
rm_date()
,
rm_default()
,
rm_dollar()
,
rm_email()
,
rm_emoticon()
,
rm_endmark()
,
rm_hash()
,
rm_nchar_words()
,
rm_non_words()
,
rm_number()
,
rm_percent()
,
rm_phone()
,
rm_postal_code()
,
rm_repeated_characters()
,
rm_repeated_phrases()
,
rm_repeated_words()
,
rm_tag()
,
rm_time()
,
rm_title_name()
,
rm_url()
,
rm_white()
,
rm_zip()
Examples
x <- c("Hello World", "Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
Encoding(x) <- "latin1"
x
rm_non_ascii(x)
rm_non_ascii(x, replacement="<<FLAG>>")
ex_non_ascii(x)
ex_non_ascii(x, ascii.out=FALSE)
## simple regex to remove non-ascii
rm_default(x, pattern="[^ -~]")
ex_default(x, pattern="[^ -~]")