gof_gerbil {gerbil} | R Documentation |
Goodness-of-fit testing for gerbil
objects
Description
Using a gerbil
object as an input, this function performs univariate and bivariate goodness-of-fit tests
to compare distributions of imputed and observed values.
Usage
gof_gerbil(
x,
y = NULL,
type = 1,
imp = 1,
breaks = NULL,
method = c("chi-squared", "fisher", "G"),
ks = FALSE,
partial = "imputed",
...
)
Arguments
x |
A |
y |
A vector listing the column names of the imputed data for which tests should be run. See details. By default, |
type |
A scalar used to specify the type of tests that will be performed. Options include univariate (marginal) tests ( |
imp |
A scalar or vector indicating which of the multiply imputed datasets should be used for testing. Defaults to |
breaks |
Used to determine the cut-points for binning of continuous variables into categories. Ideally, |
method |
The type of test that is used to compare contingency tables. Options include |
ks |
If |
partial |
Indicates how partially imputed pairs are handled in bivariate testing. If |
... |
Arguments to be passed to methods. |
Details
Goodness of fit is determined using contingency tables of counts across categories of the corresponding variable(s).
For univariate testing (type = 1
), a one-way table is calculated for observed cases and compared to an analogous table for imputed cases,
whereas for bivariate testing (type = 2
), two-way tables are calculated.
Continuous variables are binned according to cut-points defined using the parameter breaks
.
Tests are performed using one of three methods (determined from the parameter method
): 1) Chi-squared (the default); 2) Fisher's exact; and 3) A G-test.
G-testing is implemented via the function GTest()
from the DescTools
package.
Note that for univariate testing of continuous variables, a Kolmogorov-Smirnov test may be performed instead by setting ks = TRUE
.
The only required input is a parameter x
which is a gerbil
object.
Note that univariate differences between observed and imputed data may be explained by the missingness mechanism and are not necessarily indicative of poor imputations. Note also that most imputation methods like gerbil (and mice and related methods) are not designed to capture complete bivariate distributions. As such, the bivariate tests may be likely to return small p-values.
Value
gof_gerbil()
returns an object of the class gof_gerbil
that has following slots:
- Stats
A vector (when
type = 1
) or matrix (whentype = 2
) giving the value of the test statistic (or coefficient) for the corresponding variable (or variable pair).- p.values
A vector (when
type = 1
) or matrix (whentype = 2
) giving the value of the p-value for the test applied to the corresponding variable (or variable pair).- Test
A vector (when
type = 1
) or matrix (whentype = 2
) indicating the type of test applied to the corresponding variable (or variable pair).- Breaks
A list giving the cutpoints used for binning each continuous or semi-continuous variable.
Examples
#Load the India Human Development Survey-II dataset
data(ihd_mcar)
imps.gerbil <- gerbil(ihd_mcar, m = 1, mcmciter = 200, ords = "education_level",
semi = "farm_labour_days", bincat = c("sex", "marital_status", "job_field", "own_livestock"))
#Run univariate tests
tests.gerbil.uni <- gof_gerbil(imps.gerbil, imp = 1, type = 1)
#Print a summary
tests.gerbil.uni
#Run bivariate tests
tests.gerbil.bi <- gof_gerbil(imps.gerbil, imp = 1, type = 2)
#Print a summary
tests.gerbil.bi