strings_to_NA {eHDPrep} | R Documentation |
Replace values in non-numeric columns with NA
Description
Replaces specified or pre-defined strings in non-numeric columns with
NA
.
Usage
strings_to_NA(data, ..., strings_to_replace = NULL)
Arguments
data |
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
... |
< |
strings_to_replace |
character vector of values to be replaced with
|
Details
Columns to process can be specified in custom arguments (...
) or will
be applied to all non-numeric columns.
Default strings which will be replaced with NA
are as follows:
"Undetermined", "unknown", "missing", "fail", "fail / unknown",
"equivocal", "equivocal / unknown", "*".
String search is made using grepl
and supports
regex
so metacharacters (. \ | ( ) [ ] { } ^ $ * + ? $
)
should be escaped with a "\
" prefix.
Matches are case sensitive by default but can ignore case with the parameter:
ignore.case = TRUE
in ...
).
Value
data with specified values replaced with NA.
Examples
data(example_data)
# original unique values in diabetes column:
unique(example_data$diabetes)
# Using default values
res <- strings_to_NA(example_data)
unique(res$diabetes)
# original unique values in diabetes_type column:
unique(example_data$diabetes_type)
# Using custom values
res <- strings_to_NA(example_data, strings_to_replace = "Type I")
unique(res$diabetes_type)