cat_contrast {tabbycat} | R Documentation |
Calculate the frequency of discrete values in one categorical variable for each of two mutually exclusive groups within another categorical variable
Description
This function shows the distrbution of values within given a categorical
variable for one group within another categorical variable, and compares it
with the distribution among all observations not in that group. Its purpose
is to let you see quickly whether the distribution within that group differs
from the distribution for the rest of the observations. The results are
sorted in descending order of frequency for the named group i.e. the group
named in col_group
.
Usage
cat_contrast(
data,
row_cat,
col_cat,
col_group,
na.rm.row = FALSE,
na.rm.col = FALSE,
na.rm = NULL,
only = "",
clean_names = getOption("tabbycat.clean_names"),
na_label = getOption("tabbycat.na_label"),
other_label = getOption("tabbycat.other_label")
)
Arguments
data |
A dataframe containing the two variables of interest. |
row_cat |
The column name of a categorical variable whose distribution
should be calculated for each exclusive group in |
col_cat |
The column name of a categorical variable that will be split into two exclusive groups, one containing observations with a particular value of that variable, and another containing all other observations. |
col_group |
The name of the group within |
na.rm.row |
A boolean indicating whether to exclude NAs from the row results. The default is FALSE. |
na.rm.col |
A boolean indicating whether to exclude NAs from the column results. The default is FALSE. |
na.rm |
A boolean indicating whether to exclude NAs from both row and
column results. This argument is provided as a convenience. It allows you
to set |
only |
A string indicating that only one set of frequency columns
should be returned in the results. If |
clean_names |
A boolean indicating whether the column names of the
results tibble should be cleaned, so that any column names produced from
data are converted to snake_case. The default is TRUE, but this can be
changed with |
na_label |
A string indicating the label to use for the columns that contain data for missing values. The default value is "na", but use this argument to set a different value if the default value collides with data in your dataset. |
other_label |
A string indicating the label to use for the columns that contain data for observations not in the named group. The default value is "other", but use this argument to set a different value if the default value collides with data in your dataset. |
Value
A tibble showing the distribution of row_cat
within each of
the two exclusive groups in col_cat
.