dichoDif {difR} | R Documentation |

## Comparison of DIF detection methods

### Description

This function compares the specified DIF detection methods with respect to the detected items.

### Usage

```
dichoDif(Data, group, focal.name, method, anchor = NULL, props = NULL,
thrTID = 1.5, alpha = 0.05, MHstat = "MHChisq", correct = TRUE,
exact = FALSE, stdWeight = "focal", thrSTD = 0.1, BDstat = "BD",
member.type = "group", match = "score", type = "both", criterion = "LRT",
model = "2PL", c = NULL, engine = "ltm", discr = 1, irtParam = NULL,
same.scale = TRUE, signed = FALSE, purify = FALSE, purType = "IPP1",
nrIter = 10, extreme = "constraint", const.range = c(0.001, 0.999),
nrAdd = 1, p.adjust.method = NULL, save.output = FALSE,
output = c("out", "default"))
## S3 method for class 'dichoDif'
print(x, ...)
```

### Arguments

`Data` |
numeric: either the data matrix only, or the data matrix plus the vector of group membership. See |

`group` |
numeric or character: either the vector of group membership or the column indicator (within |

`focal.name` |
numeric or character indicating the level of |

`method` |
character: the name of the selected method. Possible values are |

`anchor` |
either |

`props` |
either |

`thrTID` |
numeric: the threshold for detecting DIF items with TID method (default is 1.5). |

`alpha` |
numeric: significance level (default is 0.05). |

`MHstat` |
character: specifies the DIF statistic to be used for DIF identification. Possible values are |

`correct` |
logical: should the Mantel-Haenszel continuity correction be used? (default is TRUE). |

`exact` |
logical: should an exact test be computed? (default is |

`stdWeight` |
character: the type of weights used for the standardized P-DIF statistic. Possible values are |

`thrSTD` |
numeric: the threshold (cut-score) for standardized P-DIF statistic (default is 0.10). |

`BDstat` |
character specifying the DIF statistic to be used. Possible values are |

`member.type` |
character: either |

`match` |
specifies the type of matching criterion. Can be either |

`type` |
a character string specifying which DIF effects must be tested. Possible values are |

`criterion` |
a character string specifying which DIF statistic is computed. Possible values are |

`model` |
character: the IRT model to be fitted (either |

`c` |
optional numeric value or vector giving the values of the constrained pseudo-guessing parameters. See |

`engine` |
character: the engine for estimating the 1PL model, either |

`discr` |
either |

`irtParam` |
matrix with |

`same.scale` |
logical: are the item parameters of the |

`signed` |
logical: should the Raju's statistics be computed using the signed ( |

`purify` |
logical: should the method be used iteratively to purify the set of anchor items? (default is FALSE). |

`purType` |
character: the type of purification process to be run. Possible values are |

`nrIter` |
numeric: the maximal number of iterations in the item purification process (default is 10). |

`extreme` |
character: the method used to modify the extreme proportions. Possible values are |

`const.range` |
numeric: a vector of two constraining proportions. Default values are 0.001 and 0.999. Ignored if |

`nrAdd` |
integer: the number of successes and the number of failures to add to the data in order to adjust the proportions. Default value is 1. Ignored if |

`p.adjust.method` |
either |

`save.output` |
logical: should the output be saved into a text file? (Default is |

`output` |
character: a vector of two components. The first component is the name of the output file, the second component is either the file path or |

`x` |
result from a |

`...` |
other generic parameters for the |

### Details

`dichoDif`

is a generic function which calls one or several DIF detection methods and summarize their output. The possible methods are:

`"TID"`

for Transformed Item Difficulties (TID) method (Angoff and Ford, 1973),`"MH"`

for mantel-Haenszel (Holland and Thayer, 1988),`"Std"`

for standardization (Dorans and Kulick, 1986),`"BD"`

for Breslow-Day method (Penfield, 2003),`"Logistic"`

for logistic regression (Swaminathan and Rogers, 1990),`"SIBTEST"`

for SIBTEST (Shealy and Stout) and Crossing-SIBTEST (Chalmers, 2018; Li and Stout, 1996) methods,`"Lord"`

for Lord's chi-square test (Lord, 1980),`"Raju"`

for Raju's area method (Raju, 1990), and`"LRT"`

for likelihood-ratio test method (Thissen, Steinberg and Wainer, 1988).

If `method`

has a single component, the output of `dichoDif`

is exactly the one provided by the method itself. Otherwise, the main output is a matrix with one row per item and one column per method. For each specified method and related arguments, items detected as DIF and non-DIF are respectively encoded as `"DIF"`

and `"NoDIF"`

. When printing the output an additional column is added, counting the number of times each item was detected as functioning
differently (Note: this is just an informative summary, since the methods are obviously not independent for the detection of DIF items).

The `Data`

is a matrix whose rows correspond to the subjects and columns to the items. In addition, `Data`

can hold the vector of group membership. If so, `group`

indicates the column of `Data`

which corresponds to the group membership, either by specifying its name or by giving the column number. Otherwise, `group`

must be a vector of same length as `nrow(Data)`

.

Missing values are allowed for item responses (not for group membership) but must be coded as `NA`

values. They are discarded from either the computation of the sum-scores, the fitting of the logistic models or the IRT models (according to the method).

The vector of group membership must hold only two different values, either as numeric or character. The focal group is defined by the value of the argument `focal.name`

.

For `"MH"`

, `"Std"`

, `"Logistic"`

and `"BD"`

methods, the matching criterion can be either the test score or any other continuous or discrete variable to be passed in the `Logistik`

function. This is specified by the `match`

argument. By default, it takes the value `"score"`

and the test score (i.e. raw score) is computed. The second option is to assign to `match`

a vector of continuous or discrete numeric values, which acts as the matching criterion. Note that for consistency this vector should not belong to the `Data`

matrix.

For Lord and Raju methods, one can specify either the IRT model to be fitted (by means of `model`

, `c`

, `engine`

and `discr`

arguments), or the item parameter estimates with arguments `irtParam`

and `same.scale`

. See `difLord`

and `difRaju`

for further details.

The threshold for detecting DIF items depends on the method. For standardization it has to be fully specified (with the `thr`

argument), as well as for the TID method (through the `thrTID`

argument). For the other methods it is depending on the significance level set by `alpha`

.

For Mantel-Haenszel method, the DIF statistic can be either the Mantel-Haenszel chi-square statistic or the log odds-ratio statistic. The method is specified by the argument `MHstat`

, and the default value is `"MHChisq"`

for the chi-square statistic. Moreover, the option `correct`

specifies whether the continuity correction has to be applied to Mantel-Haenszel statistic. See `difMH`

for further details.

By default, the asymptotic Mantel-Haenszel statistic is computed. However, the exact statistics and related P-values can be obtained by specifying the logical argument `exact`

to `TRUE`

. See Agresti (1990, 1992) for further details about exact inference.

The weights for computing the standardized P-DIF statistics are defined through the argument `stdWeight`

, with possible values `"focal"`

(default value), `"reference"`

and `"total"`

. See `stdPDIF`

for further details.

For Breslow-Day method, two test statistics are available: the usual Breslow-Day statistic for testing homogeneous association (Aguerri, Galibert, Attorresi and Maranon, 2009) and the trend test statistic for assessing some monotonic trend in the odss ratios (Penfield, 2003). The DIF statistic is supplied by the `BDstat`

argument, with values `"BD"`

(default) for the usual statistic and `"trend"`

for the trend test statistic.

For logistic regression, the argument `type`

permits to test either both uniform and nonuniform effects simultaneously (`type="both"`

), only uniform DIF effect (`type="udif"`

) or only nonuniform DIF effect (`type="nudif"`

). The `criterion`

argument specifies the DIF statistic to be computed, either the likelihood ratio test statistic (by setting `criterion="LRT"`

) or the Wald test (by setting `criterion="Wald"`

). Moreover, the group membership can be either a vector of two distinct values, one for the reference group and one for the focal group, or a continuous or discrete variable that acts as the "group" membership variable. In the former case, the `member.type`

argument is set to `"group"`

and the `focal.name`

defines which value in the `group`

variable stands for the focal group. In the latter case, `member.type`

is set to `"cont"`

, `focal.name`

is ignored and each value of the `group`

represents one "group" of data (that is, the DIF effects are investigated among participants relying on different values of some discrete or continuous trait). See `Logistik`

for further details.

The SIBTEST method (Shealy and Stout, 1993) and its modified version, the Crossing-SIBTEST (Chalmers, 2018; Li and Stout, 1996) are returned by the `difSIBTEST`

function. SIBTEST method is returned when `type`

argument is set to `"udif"`

, while Crossing-SIBTEST is set with `"nudif"`

value for the `type`

argument. Note that `type`

takes the by-default value `"both"`

which is not allowed within the `difSIBTEST`

function; however, within this fucntion, keeping the by-default value yields selection of Crossing-SIBTEST.

The `difSIBTEST`

function is a wrapper to the `SIBTEST`

function from the **mirt** package (Chalmers, 2012) to fit within the `difR`

framework (Magis et al., 2010). Therefore, if you are using this function for publication purposes please cite Chalmers (2018; 2012) and Magis et al. (2010).

For Raju's method, the type of area (signed or unsigned) is fixed by the logical `signed`

argument, with default value `FALSE`

(i.e. unsigned areas). See `RajuZ`

for further details.

Item purification can be requested by specifying `purify`

option to `TRUE`

. Recall that item purification process is slightly different for IRT and for non-IRT based methods. See the corresponding methods for further information.

Adjustment for multiple comparisons is possible with the argument `p.adjust.method`

. See the corresponding methods for further information.

A pre-specified set of anchor items can be provided through the `anchor`

argument. For non-IRT methods, anchor items are used to compute the test score (as matching criterion). For IRT methods, anchor items are used to rescale the item parameters on a common metric. See the corresponding methods for further information. Note that `anchor`

argument is not working with `"LRT"`

method.

The output of the `dichoDif`

function can be stored in a text file by fixing `save.output`

and `output`

appropriately. See the help file of `selectDif`

function (or any other DIF method) for further information.

### Value

Either the output of one of the DIF detection methods, or a list of class "dichoDif" with the following arguments:

`DIF` |
a character matrix with one row per item and whose columns refer to the different specified detection methods. See |

`props` |
the value of the |

`thrTID` |
the value of the |

`correct` |
the value of |

`exact` |
the value of |

`alpha` |
the significance level |

`MHstat` |
the value of the |

`stdWeight` |
the value of the |

`thrSTD` |
the value of |

`BDstat` |
the value of the |

`member.type` |
the value of the |

`match` |
the value of the |

`type` |
the value of the |

`criterion` |
the value of the |

`model` |
the value of |

`c` |
the value of |

`engine` |
The value of the |

`discr` |
the value of the |

`irtParam` |
the value of |

`same.scale` |
the value of |

`p.adjust.method` |
the value of the |

`purification` |
the value of |

`nrPur` |
an integer vector (of length equal to the number of methods) with the number of iterations in the purification process.
Returned only if |

`convergence` |
a logical vector (of length equal to the number of methods) indicating whether the iterative purification process converged. Returned only if |

`anchor.names` |
the value of the |

`save.output` |
the value of the |

`output` |
the value of the |

### Author(s)

Sebastien Beland

Collectif pour le Developpement et les Applications en Mesure et Evaluation (Cdame)

Universite du Quebec a Montreal

sebastien.beland.1@hotmail.com, http://www.cdame.uqam.ca/

David Magis

Department of Psychology, University of Liege

Research Group of Quantitative Psychology and Individual Differences, KU Leuven

David.Magis@uliege.be, http://ppw.kuleuven.be/okp/home/

Gilles Raiche

Collectif pour le Developpement et les Applications en Mesure et Evaluation (Cdame)

Universite du Quebec a Montreal

raiche.gilles@uqam.ca, http://www.cdame.uqam.ca/

### References

Agresti, A. (1990). *Categorical data analysis*. New York: Wiley.

Agresti, A. (1992). A survey of exact inference for contingency tables. *Statistical Science, 7*, 131-177. doi: 10.1214/ss/1177011454

Aguerri, M.E., Galibert, M.S., Attorresi, H.F. and Maranon, P.P. (2009). Erroneous detection of nonuniform DIF using the Breslow-Day test in a short test. *Quality and Quantity, 43*, 35-44. doi: 10.1007/s11135-007-9130-2

Angoff, W. H., and Ford, S. F. (1973). Item-race interaction on a test of scholastic aptitude. *Journal of Educational Measurement, 2*, 95-106. doi: 10.1111/j.1745-3984.1973.tb00787.x

Chalmers, R. P. (2012). mirt: A Multidimensional item response
theory package for the R environment. *Journal of Statistical Software, 48(6)*, 1-29. doi: 10.18637/jss.v048.i06

Chalmers, R. P. (2018). Improving the Crossing-SIBTEST statistic for detecting non-uniform DIF. *Psychometrika, 83*(2), 376–386. doi: 10.1007/s11336-017-9583-8

Dorans, N. J. and Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the
Scholastic Aptitude Test. *Journal of Educational Measurement, 23*, 355-368. doi: 10.1111/j.1745-3984.1986.tb00255.x

Holland, P. W. and Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer and H. I. Braun (Dirs.), *Test
validity*. Hillsdale, NJ: Lawrence Erlbaum Associates.

Li, H.-H., and Stout, W. (1996). A new procedure for detection of crossing DIF. *Psychometrika, 61*, 647–677. doi: 10.1007/BF02294041

Lord, F. (1980). *Applications of item response theory to practical testing problems*. Hillsdale, NJ: Lawrence Erlbaum Associates.

Magis, D., Beland, S., Tuerlinckx, F. and De Boeck, P. (2010). A general framework and an R package for the detection
of dichotomous differential item functioning. *Behavior Research Methods, 42*, 847-862. doi: 10.3758/BRM.42.3.847

Penfield, R.D. (2003). Application of the Breslow-Day test of trend in odds ratio heterogeneity to the detection of nonuniform DIF. *Alberta Journal of
Educational Research, 49*, 231-243.

Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. *Applied Psychological Measurement, 14*, 197-207. doi: 10.1177/014662169001400208

Shealy, R. and Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detect test bias/DTF as well as item bias/DIF. *Psychometrika, 58*, 159-194. doi: 10.1007/BF02294572

Swaminathan, H. and Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. *Journal of Educational
Measurement, 27*, 361-370. doi: 10.1111/j.1745-3984.1990.tb00754.x

Thissen, D., Steinberg, L. and Wainer, H. (1988). Use of item response theory in the study of group difference in trace lines. In H. Wainer and H. Braun (Eds.),
*Test validity*. Hillsdale, NJ: Lawrence Erlbaum Associates.

### See Also

`difTID`

, `difMH`

, `difStd`

, `difBD`

, `difLogistic`

, `difSIBTEST`

, `difLord`

, `difRaju`

,
`difLRT`

### Examples

```
## Not run:
# Loading of the verbal data
data(verbal)
attach(verbal)
# Excluding the "Anger" variable
verbal <- verbal[colnames(verbal)!="Anger"]
# Comparing TID, Mantel-Haenszel, standardization; logistic regression and SIBTEST
# TID threshold 1.0
# Standardization threshold 0.08
# no continuity correction,
# with item purification
# both types of DIF effect for logistic regression
# CSIBTEST method
dichoDif(verbal, group = 25, focal.name = 1, method = c("TID", "MH", "Std",
"Logistic", "SIBTEST"), correct = FALSE, thrSTD = 0.08, thrTID = 1, purify = TRUE)
# Same analysis, but using items 1 to 5 as anchor and saving the output into
# the 'dicho' file
dichoDif(verbal, group = 25, focal.name = 1, method = c("TID", "MH", "Std",
"Logistic"), correct = FALSE, thrSTD = 0.08, thrTID = 1, purify = TRUE,
anchor = 1:5,save.output = TRUE, output = c("dicho", "default"))
# Comparing Lord and Raju results with 2PL model and
# with item purification
dichoDif(verbal, group = 25, focal.name = 1, method = c("Lord", "Raju"),
model = "2PL", purify = TRUE)
## End(Not run)
```

*difR*version 5.1 Index]