R: Iteratively calculate disproportionate impact using multiple...

di_iterate {DisImpact}

R Documentation

Iteratively calculate disproportionate impact using multiple method for many variables.

Description

Iteratively calculate disproportionate impact via the percentage point gap (PPG), proportionality index, and 80% index methods for many success variables, disaggregation variables, and scenarios.

Usage

di_iterate(
  data,
  success_vars,
  group_vars,
  cohort_vars = NULL,
  scenario_repeat_by_vars = NULL,
  exclude_scenario_df = NULL,
  weight_var = NULL,
  include_non_disagg_results = TRUE,
  ppg_reference_groups = "overall",
  min_moe = 0.03,
  use_prop_in_moe = FALSE,
  prop_sub_0 = 0.5,
  prop_sub_1 = 0.5,
  di_prop_index_cutoff = 0.8,
  di_80_index_cutoff = 0.8,
  di_80_index_reference_groups = "hpg",
  check_valid_reference = TRUE,
  parallel = FALSE,
  parallel_n_cores = parallel::detectCores(),
  parallel_split_to_disk = FALSE
)

Arguments

`data`	A data frame for which to iterate DI calculations for a set of variables.
`success_vars`	A character vector of success variable names to iterate across.
`group_vars`	A character vector of group (disaggregation) variable names to iterate across.
`cohort_vars`	(Optional) A character vector of the same length as `success_vars` to indicate the cohort variable to be used for each variable specified in `success_vars`. A vector of length 1 could be specified, in which case the same cohort variable is used for each success variable. If not specified, then a single cohort is assumed for all success variables.
`scenario_repeat_by_vars`	(Optional) A character vector of variables to repeat DI calculations for across all combination of these variables. For example, the following variables could be specified: Ed Goal: Degree/Transfer, Shot-term Career, Non-credit First time college student: Yes, No Full-time status: Yes, No Each combination of these variables (eg, full time, first time college students with an ed goal of degree/transfer as one combination) would constitute an iteration / sample for which to calculate disproportionate impact for outcomes listed in `success_vars` and for the disaggregation variables listed in `group_vars`. The overall rate of success for full time, first time college students with an ed goal of degree/transfer would just include these students and not others. Each variable specified is also collapsed to an '- All' group so that the combinations also reflect all students of a particular category. The total number of combinations for the previous example would be (+1 representing the all category): (3 + 1) x (2 + 1) x (2 + 1) = 36.
`exclude_scenario_df`	(Optional) A data frame with variables that match `scenario_repeat_by_vars` for specifying the combinations to exclude from DI calculations. Following the example specified above, one could choose to exclude part-time non-credit students from consideration.
`weight_var`	(Optional) A character variable specifying the weight variable if the input data set is summarized (i.e., the the success variables specified in `success_vars` contain count of successes). Weight here corresponds to the denominator when calculating the success rate. Defaults to `NULL` for an input data set where each row describes each individual.
`include_non_disagg_results`	A logical variable specifying whether or not the non-disaggregated results should be returned; defaults to `TRUE`. When `TRUE`, a new variable `- None` is added to the data set with a single data value `'- All'`, and this variable is added `group_vars` as a disaggregation/group variable. The user would want these results returned to review non-disaggregated results.
`ppg_reference_groups`	Either `'overall'`, `'hpg'`, `'all but current'`, or a character vector of the same length as `group_vars` that indicates the reference group value for each group variable in `group_vars` when determining disproportionate impact using the percentage point gap method.
`min_moe`	The minimum margin of error to be used in the PPG calculation, passed to di_ppg.
`use_prop_in_moe`	Whether the estimated proportions should be used in the margin of error calculation by the PPG, passed to di_ppg.
`prop_sub_0`	passed to di_ppg; defaults to 0.50.
`prop_sub_1`	passed to di_ppg; defaults to 0.50.
`di_prop_index_cutoff`	Threshold used for determining disproportionate impact using the proportionality index; passed to di_prop_index; defaults to 0.80.
`di_80_index_cutoff`	Threshold used for determining disproportionate impact using the 80% index; passed to di_80_index; defaults to 0.80.
`di_80_index_reference_groups`	Either `'overall'`, `'hpg'`, `'all but current'`, or a character vector of the same length as `group_vars` that indicates the reference group value for each group variable in `group_vars` when determining disproportionate impact using the 80% index.
`check_valid_reference`	Check whether `ppg_reference_groups` and `di_80_index_reference_groups` contain valid values; defaults to `TRUE`.
`parallel`	If `TRUE`, then perform calculations in parallel based on the scenarios specified by `scenario_repeat_by_vars`. Defaults to `FALSE`. Parallel execution is based on the `parallel` package included in base R, using parLapply on Windows and mclapply on POSIX-based systems (Linux/Mac).
`parallel_n_cores`	The number of CPU cores to use if `parallel=TRUE`. Defaults to the maximum number CPU cores on the system.
`parallel_split_to_disk`	If `TRUE` and `parallel=TRUE`, then create intermediate data sets for each scenario generated by `scenario_repeat_by_vars`, write them to disk, and import the required data set when necessary for each scenario executing in parallel. This feature is useful when the data set specified by `data` is very large and parallel execution is desired for speed in order to reduce the likelihood of consuming all the system's memory and crashing. Note that there is an overhead I/O cost on speed when this feature is used. Defaults to `FALSE`.

Details

Iteratively calculate disproportionate impact via the percentage point gap (PPG), proportionality index, and 80% index methods for all combinations of success_vars, group_vars, and cohort_vars, for each combination of subgroups specified by scenario_repeat_by_vars.

Value

A summarized data set (data frame) consisting of:

success_variable (elements of success_vars),
disaggregation (elements of group_vars),
cohort (values corresponding to the variables specified in cohort_vars,
di_indicator_ppg (1 if there is disproportionate impact per the percentage point gap method, 0 otherwise),
di_indicator_prop_index (1 if there is disproportionate impact per the proportionality index, 0 otherwise),
di_indicator_80_index (1 if there is disproportionate impact per the 80% index, 0 otherwise), and
other relevant fields returned from di_ppg, di_prop_index, and di_80_index.

Examples

library(dplyr)
data(student_equity)
# Multiple group variables
di_iterate(data=student_equity, success_vars=c('Transfer')
  , group_vars=c('Ethnicity', 'Gender'), cohort_vars=c('Cohort')
  , ppg_reference_groups='overall')

[Package DisImpact version 0.0.21 Index]