R: Confidence intervals and standard errors for the...

twoCatCI {SynthTools}

R Documentation

Confidence intervals and standard errors for the cross-tabulation of two categorical variables of derived with multiply imputed datasets.

Description

This function will calculate confidence intervals and standard errors from the proportional tabular responses of multiply imputed datasets for the cross-tabulation of two categorical variables, and also give a YES/NO indicator for whether or not the observed value is within the confidence interval. The confidence intervals and standard errors are calculated from formulas that are adapted for fully and partially synthetic data sets. See reference for more information.

Usage

twoCatCI(obs_data, imp_data_list, type, vars, sig = 4, alpha = 0.05)

Arguments

`obs_data`	The original dataset to which the next will be compared, of the type "data.frame".
`imp_data_list`	A list composed of `m` synthetic data sets.
`type`	Specifies which type of datasets are in `imp_data_list`. Options are "fully" and "partially".
`vars`	A vector of the two categorical variable being checked. Should be of type "factor".
`sig`	The number of significant digits in the output dataframes. Defaults to 4.
`alpha`	Test size, defaults to 0.05.

Details

This function was developed with the intention of making the job of researching synthetic data utility a bit easier by providing another way of measuring utility.

Value

This function returns a list of five data frames:

`Observed`	A cross-tabular proportion of observed values
`Lower`	Lower limit of the confidence interval
`Upper`	Upper limit of the confidence interval
`SEs`	Standard Errors
`CI_Indicator`	"YES"/"NO" indicating whether or not the observed value is within the confidence interval

References

Reiter JP, Raghunathan TE (2007). “The Multiple Adaptations of Multiple Imputation.” Journal of the American Statistical Association.

Examples

#PPA is the observed data set.  PPAm5 is a list of 5 partially synthetic data sets derived from PPA.
#"sex" and "race" are categorical variables present in the synthesized data sets.
#3 significant digits are desired in the output dataframes.

twoCatCI(PPA, PPAm5, "partially", c("sex", "race"), sig=3)

[Package SynthTools version 1.0.1 Index]