strings_to_NA {eHDPrep} R Documentation

## Replace values in non-numeric columns with NA

### Description

Replaces specified or pre-defined strings in non-numeric columns with NA.

### Usage

strings_to_NA(data, ..., strings_to_replace = NULL)


### Arguments

 data A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). ... <tidy-select> One or more unquoted expressions separated by commas. Variable names can be used as if they were positions in the data frame, so expressions like x:y can be used to select a range of variables. strings_to_replace character vector of values to be replaced with NA.

### Details

Columns to process can be specified in custom arguments (...) or will be applied to all non-numeric columns. Default strings which will be replaced with NA are as follows: "Undetermined", "unknown", "missing", "fail", "fail / unknown", "equivocal", "equivocal / unknown", "*". String search is made using grepl and supports regex so metacharacters (. \ | ( ) [ ] { } ^ $* + ?$) should be escaped with a "\" prefix. Matches are case sensitive by default but can ignore case with the parameter: ignore.case = TRUE in ...).

### Value

data with specified values replaced with NA.

### Examples

data(example_data)

# original unique values in diabetes column:
unique(example_data$diabetes) # Using default values res <- strings_to_NA(example_data) unique(res$diabetes)

# original unique values in diabetes_type column:
unique(example_data$diabetes_type) # Using custom values res <- strings_to_NA(example_data, strings_to_replace = "Type I") unique(res$diabetes_type)



[Package eHDPrep version 1.2.1 Index]