twoCatCI {SynthTools}R Documentation

Confidence intervals and standard errors for the cross-tabulation of two categorical variables of derived with multiply imputed datasets.

Description

This function will calculate confidence intervals and standard errors from the proportional tabular responses of multiply imputed datasets for the cross-tabulation of two categorical variables, and also give a YES/NO indicator for whether or not the observed value is within the confidence interval. The confidence intervals and standard errors are calculated from formulas that are adapted for fully and partially synthetic data sets. See reference for more information.

Usage

twoCatCI(obs_data, imp_data_list, type, vars, sig = 4, alpha = 0.05)

Arguments

obs_data

The original dataset to which the next will be compared, of the type "data.frame".

imp_data_list

A list composed of m synthetic data sets.

type

Specifies which type of datasets are in imp_data_list. Options are "fully" and "partially".

vars

A vector of the two categorical variable being checked. Should be of type "factor".

sig

The number of significant digits in the output dataframes. Defaults to 4.

alpha

Test size, defaults to 0.05.

Details

This function was developed with the intention of making the job of researching synthetic data utility a bit easier by providing another way of measuring utility.

Value

This function returns a list of five data frames:

Observed

A cross-tabular proportion of observed values

Lower

Lower limit of the confidence interval

Upper

Upper limit of the confidence interval

SEs

Standard Errors

CI_Indicator

"YES"/"NO" indicating whether or not the observed value is within the confidence interval

References

Reiter JP, Raghunathan TE (2007). “The Multiple Adaptations of Multiple Imputation.” Journal of the American Statistical Association.

Examples

#PPA is the observed data set.  PPAm5 is a list of 5 partially synthetic data sets derived from PPA.
#"sex" and "race" are categorical variables present in the synthesized data sets.
#3 significant digits are desired in the output dataframes.

twoCatCI(PPA, PPAm5, "partially", c("sex", "race"), sig=3)

[Package SynthTools version 1.0.1 Index]