encode_binary_cats {eHDPrep} | R Documentation |
Encode categorical variables as binary factors
Description
In a data frame, converts binary categories to factors. Ordering of levels is
standardised to: negative_finding, positive_finding
. This embeds a
standardised numeric relationship between the binary categories while
preserving value labels.
Usage
encode_binary_cats(data, ..., values = NULL)
Arguments
data |
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
... |
< |
values |
Optional named vector of user-defined values for binary values
using |
Details
Binary categories to convert can be specified with a named character vector,
specified in values
. The syntax of the named vector is:
negative_finding = positive_finding
. If values
is not
provided, the default list will be used: "No"="Yes", "No/unknown" =
"Yes", "no/unknown" = "Yes", "Non-user" = "User", "Never" = "Ever", "WT" =
"MT"
.
Value
dataset with specified binary categories converted to factors.
Examples
# use built-in values. Note: rural_urban is not modified
# Note: diabetes is not modified because "missing" is interpreted as a third category.
# strings_to_NA() should be applied first
encode_binary_cats(example_data, hypertension, rural_urban)
# use custom values. Note: rural_urban is now modified as well.
encoded_data <- encode_binary_cats(example_data, hypertension, rural_urban,
values = c("No"= "Yes", "rural" = "urban"))
# to demonstrate the new numeric encoding:
dplyr::mutate(encoded_data, hypertension_num = as.numeric(hypertension), .keep = "used")