compare_category {dlookr} | R Documentation |
Compare categorical variables
Description
The compare_category() compute information to examine the relationship between categorical variables.
Usage
compare_category(.data, ...)
## S3 method for class 'data.frame'
compare_category(.data, ...)
Arguments
.data |
a data.frame or a |
... |
one or more unquoted expressions separated by commas. You can treat variable names like they are positions. Positive values select variables; negative values to drop variables. These arguments are automatically quoted and evaluated in a context where column names represent column positions. They support unquoting and splicing. |
Details
It is important to understand the relationship between categorical variables in EDA. compare_category() compares relations by pair combination of all categorical variables. and return compare_category class that based list object.
Value
An object of the class as compare based list. The information to examine the relationship between categorical variables is as follows each components.
var1 : factor. The level of the first variable to compare. 'var1' is the name of the first variable to be compared.
var2 : factor. The level of the second variable to compare. 'var2' is the name of the second variable to be compared.
n : integer. frequency by var1 and var2.
rate : double. relative frequency.
first_rate : double. relative frequency in first variable.
second_rate : double. relative frequency in second variable.
Attributes of return object
Attributes of compare_category class is as follows.
variables : character. List of variables selected for comparison.
combination : matrix. It consists of pairs of variables to compare.
See Also
summary.compare_category
, print.compare_category
, plot.compare_category
.
Examples
# Generate data for the example
heartfailure2 <- heartfailure
heartfailure2[sample(seq(NROW(heartfailure2)), 5), "smoking"] <- NA
library(dplyr)
# Compare the all categorical variables
all_var <- compare_category(heartfailure2)
# Print compare_numeric class objects
all_var
# Compare the categorical variables that case of joint the death_event variable
all_var %>%
"["(grep("death_event", names(all_var)))
# Compare the two categorical variables
two_var <- compare_category(heartfailure2, smoking, death_event)
# Print compare_category class objects
two_var
# Filtering the case of smoking included NA
two_var %>%
"[["(1) %>%
filter(!is.na(smoking))
# Summary the all case : Return a invisible copy of an object.
stat <- summary(all_var)
# Summary by returned objects
stat
# component of table
stat$table
# component of chi-square test
stat$chisq
# component of chi-square test
summary(all_var, "chisq")
# component of chi-square test (first, third case)
summary(all_var, "chisq", pos = c(1, 3))
# component of relative frequency table
summary(all_var, "relative")
# component of table without missing values
summary(all_var, "table", na.rm = TRUE)
# component of table include marginal value
margin <- summary(all_var, "table", marginal = TRUE)
margin
# component of chi-square test
summary(two_var, method = "chisq")
# verbose is FALSE
summary(all_var, "chisq", verbose = FALSE)
#' # Using pipes & dplyr -------------------------
# If you want to use dplyr, set verbose to FALSE
summary(all_var, "chisq", verbose = FALSE) %>%
filter(p.value < 0.26)
# Extract component from list by index
summary(all_var, "table", na.rm = TRUE, verbose = FALSE) %>%
"[["(1)
# Extract component from list by name
summary(all_var, "table", na.rm = TRUE, verbose = FALSE) %>%
"[["("smoking vs death_event")
# plot all pair of variables
plot(all_var)
# plot a pair of variables
plot(two_var)
# plot all pair of variables by prompt
plot(all_var, prompt = TRUE)
# plot a pair of variables
plot(two_var, las = 1)