strings_to_NA {eHDPrep}R Documentation

Replace values in non-numeric columns with NA

Description

Replaces specified or pre-defined strings in non-numeric columns with NA.

Usage

strings_to_NA(data, ..., strings_to_replace = NULL)

Arguments

data

A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr).

...

<tidy-select> One or more unquoted expressions separated by commas. Variable names can be used as if they were positions in the data frame, so expressions like x:y can be used to select a range of variables.

strings_to_replace

character vector of values to be replaced with NA.

Details

Columns to process can be specified in custom arguments (...) or will be applied to all non-numeric columns. Default strings which will be replaced with NA are as follows: "Undetermined", "unknown", "missing", "fail", "fail / unknown", "equivocal", "equivocal / unknown", "*". String search is made using grepl and supports regex so metacharacters (. \ | ( ) [ ] { } ^ $ * + ? $) should be escaped with a "\" prefix. Matches are case sensitive by default but can ignore case with the parameter: ignore.case = TRUE in ...).

Value

data with specified values replaced with NA.

Examples

data(example_data)

# original unique values in diabetes column:
unique(example_data$diabetes)
# Using default values
res <- strings_to_NA(example_data)
unique(res$diabetes)


# original unique values in diabetes_type column:
unique(example_data$diabetes_type)
# Using custom values
res <- strings_to_NA(example_data, strings_to_replace = "Type I")
unique(res$diabetes_type)


[Package eHDPrep version 1.3.3 Index]