R: Iteratively calculate disproportionate impact using multiple...

di_iterate_on_long {DisImpact}

R Documentation

Iteratively calculate disproportionate impact using multiple methods for a long and summarized data set

Description

Calculate disproportionate impact via the percentage point gap (PPG), proportionality index, and 80% index methods for a "long" and summarized data set with many success variables and disaggregation variables, where the success counts and disaggregation groups are stored in a single column or variable for each.

Usage

di_iterate_on_long(
  data,
  num_var,
  denom_var,
  disagg_var_col,
  group_var_col,
  disagg_var_col_2 = NULL,
  group_var_col_2 = NULL,
  cohort_var_col = NULL,
  summarize_by_vars = NULL,
  custom_reference_group_flag_var = NULL,
  ...
)

Arguments

`data`	A data frame for which to iterate DI calculations for a set of variables.
`num_var`	A variable name (character value) from `data` where the variable stores success counts (the numerator in success rates). Success rates are calculated by aggregating `num_var` and `denom_var` for each unique combination of values in `disagg_var_col`, `group_var_col`, `disagg_var_col_2`, `group_var_col_2`, `cohort_var_col`, and `summarize_by_vars`. If such combinations are unique (single row), then rows are not collapsed.
`denom_var`	A variable name (character value) from `data` where the variable stores the group size (the denominator in success rates).
`disagg_var_col`	A variable name (character value) from `data` where the variable stores the different disaggregation scenarios. The disaggregation variable could include such values as 'Ethnicity', 'Age Group', and 'Foster Youth', corresponding to three disaggregation scenarios.
`group_var_col`	A variable name (character value) from `data` where the variable stores the group name for each group within a level of disaggregation specified in `disagg_var_col`. For example, the group names could include 'Asian', 'White', 'Black', 'Latinx', 'Native American', and 'Other' for a disaggregation on ethnicity; 'Under 18', '18-21', '22-25', and '25+' for an age group disaggregation; and 'Yes' and 'No' for a foster youth status disaggregation.
`disagg_var_col_2`	(Optional) A variable name (character value) from `data` where the variable stores an optional second disaggregation variable, which allows for the intersectionality of variables listed in `disagg_var_col` and `disagg_var_col_2`. The second disaggregation variable could describe something not in `disagg_var_col_2`, such as 'Gender', which would require all groups described in `group_var_col` to be broken out by gender.
`group_var_col_2`	(Optional) A variable name (character value) from `data` where the variable stores the group name for each group within a second level of disaggregation specified in `disagg_var_col_2`. For example, the group names could include 'Male', 'Female', 'Non-binary', and 'Unknown' if 'Gender' is a value in the variable `disagg_var_col_2`.
`cohort_var_col`	(Optional) A variable name (character value) from `data` where the variable stores the cohort label for the data described in each row.
`summarize_by_vars`	(Optional) A character vector of variable names in `data` for which `num_var` and `denom_var` are used for aggregation to calculate success rates for the dispropotionate impact (DI) analysis set up by `disagg_var_col`, `group_var_col`, `disagg_var_col_2`, and `group_var_col_2`. For example, `summarize_by_vars=c('Outcome')` could specify a single variable/column that describes the outcome or metric in `num_var`, where the outcome values might include 'Completion of Transfer-Level Math', 'Completion of Transfer-Level English','Transfer', 'Associate Degree'.
`custom_reference_group_flag_var`	(Optional) A variable name (character value) from `data` where the variable flags the row or group that should be used as the reference group (`1` if row is a reference group, `0` otherwise) for comparison in the percentage point gap method and the 80% index method. When this argument is used, then the `ppg_reference_groups` and `di_80_index_reference_groups` arguments should not be specified.
`...`	(Optional) Other arguments such as `ppg_reference_groups`, `min_moe`, `use_prop_in_moe`, `prop_sub_0`, `prop_sub_1`, `di_prop_index_cutoff`, `di_80_index_cutoff`, `di_80_index_reference_groups`, and `check_valid_reference` from di_iterate.

Details

Iteratively calculate disproportionate impact via the percentage point gap (PPG), proportionality index, and 80% index methods for all combinations of success_vars, group_vars, and cohort_vars, for each combination of subgroups specified by scenario_repeat_by_vars.

Value

A summarized data set (data frame) consisting of:

variables specified by summarize_by_vars, disagg_var_col, group_var_col, disagg_var_col_2, and group_var_col_2,
di_indicator_ppg (1 if there is disproportionate impact per the percentage point gap method, 0 otherwise),
di_indicator_prop_index (1 if there is disproportionate impact per the proportionality index, 0 otherwise),
di_indicator_80_index (1 if there is disproportionate impact per the 80% index, 0 otherwise), and
other relevant fields returned from di_ppg, di_prop_index, and di_80_index.

Examples

library(dplyr)
data(ssm_cohort)
di_iterate_on_long(data=ssm_cohort %>% filter(missingFlag==0) # remove missing data
  , num_var='value', denom_var='denom'
  , disagg_var_col='disagg1', group_var_col='subgroup1'
  , cohort_var_col='academicYear', summarize_by_vars=c('categoryLabel')
  , ppg_reference_groups='all but current' # PPG-1
  , di_80_index_reference_groups='all but current')

[Package DisImpact version 0.0.21 Index]