dichoDif {difR} | R Documentation |
Comparison of DIF detection methods
Description
This function compares the specified DIF detection methods with respect to the detected items.
Usage
dichoDif(Data, group, focal.name, method, anchor = NULL, props = NULL,
thrTID = 1.5, alpha = 0.05, MHstat = "MHChisq", correct = TRUE,
exact = FALSE, stdWeight = "focal", thrSTD = 0.1, BDstat = "BD",
member.type = "group", match = "score", type = "both", criterion = "LRT",
model = "2PL", c = NULL, engine = "ltm", discr = 1, irtParam = NULL,
same.scale = TRUE, signed = FALSE, purify = FALSE, purType = "IPP1",
nrIter = 10, extreme = "constraint", const.range = c(0.001, 0.999),
nrAdd = 1, p.adjust.method = NULL, save.output = FALSE,
output = c("out", "default"))
## S3 method for class 'dichoDif'
print(x, ...)
Arguments
Data |
numeric: either the data matrix only, or the data matrix plus the vector of group membership. See Details. |
group |
numeric or character: either the vector of group membership or the column indicator (within |
focal.name |
numeric or character indicating the level of |
method |
character: the name of the selected method. Possible values are |
anchor |
either |
props |
either |
thrTID |
numeric: the threshold for detecting DIF items with TID method (default is 1.5). |
alpha |
numeric: significance level (default is 0.05). |
MHstat |
character: specifies the DIF statistic to be used for DIF identification. Possible values are |
correct |
logical: should the Mantel-Haenszel continuity correction be used? (default is TRUE). |
exact |
logical: should an exact test be computed? (default is |
stdWeight |
character: the type of weights used for the standardized P-DIF statistic. Possible values are |
thrSTD |
numeric: the threshold (cut-score) for standardized P-DIF statistic (default is 0.10). |
BDstat |
character specifying the DIF statistic to be used. Possible values are |
member.type |
character: either |
match |
specifies the type of matching criterion. Can be either |
type |
a character string specifying which DIF effects must be tested. Possible values are |
criterion |
a character string specifying which DIF statistic is computed. Possible values are |
model |
character: the IRT model to be fitted (either |
c |
optional numeric value or vector giving the values of the constrained pseudo-guessing parameters. See Details. |
engine |
character: the engine for estimating the 1PL model, either |
discr |
either |
irtParam |
matrix with 2J rows (where J is the number of items) and at most 9 columns containing item parameters estimates. See Details. |
same.scale |
logical: are the item parameters of the |
signed |
logical: should the Raju's statistics be computed using the signed ( |
purify |
logical: should the method be used iteratively to purify the set of anchor items? (default is FALSE). |
purType |
character: the type of purification process to be run. Possible values are |
nrIter |
numeric: the maximal number of iterations in the item purification process (default is 10). |
extreme |
character: the method used to modify the extreme proportions. Possible values are |
const.range |
numeric: a vector of two constraining proportions. Default values are 0.001 and 0.999. Ignored if |
nrAdd |
integer: the number of successes and the number of failures to add to the data in order to adjust the proportions. Default value is 1. Ignored if |
p.adjust.method |
either |
save.output |
logical: should the output be saved into a text file? (Default is |
output |
character: a vector of two components. The first component is the name of the output file, the second component is either the file path or |
x |
result from a |
... |
other generic parameters for the |
Details
dichoDif
is a generic function which calls one or several DIF detection methods and summarize their output. The possible methods are:
"TID"
for Transformed Item Difficulties (TID) method (Angoff and Ford, 1973),"MH"
for mantel-Haenszel (Holland and Thayer, 1988),"Std"
for standardization (Dorans and Kulick, 1986),"BD"
for Breslow-Day method (Penfield, 2003),"Logistic"
for logistic regression (Swaminathan and Rogers, 1990),"SIBTEST"
for SIBTEST (Shealy and Stout) and Crossing-SIBTEST (Chalmers, 2018; Li and Stout, 1996) methods,"Lord"
for Lord's chi-square test (Lord, 1980),"Raju"
for Raju's area method (Raju, 1990), and"LRT"
for likelihood-ratio test method (Thissen, Steinberg and Wainer, 1988).
If method
has a single component, the output of dichoDif
is exactly the one provided by the method itself. Otherwise, the main output is a matrix with one row per item and one column per method. For each specified method and related arguments, items detected as DIF and non-DIF are respectively encoded as "DIF"
and "NoDIF"
. When printing the output an additional column is added, counting the number of times each item was detected as functioning
differently (Note: this is just an informative summary, since the methods are obviously not independent for the detection of DIF items).
The Data
is a matrix whose rows correspond to the subjects and columns to the items. In addition, Data
can hold the vector of group membership. If so, group
indicates the column of Data
which corresponds to the group membership, either by specifying its name or by giving the column number. Otherwise, group
must be a vector of same length as nrow(Data)
.
Missing values are allowed for item responses (not for group membership) but must be coded as NA
values. They are discarded from either the computation of the sum-scores, the fitting of the logistic models or the IRT models (according to the method).
The vector of group membership must hold only two different values, either as numeric or character. The focal group is defined by the value of the argument focal.name
.
For "MH"
, "Std"
, "Logistic"
and "BD"
methods, the matching criterion can be either the test score or any other continuous or discrete variable to be passed in the Logistik
function. This is specified by the match
argument. By default, it takes the value "score"
and the test score (i.e. raw score) is computed. The second option is to assign to match
a vector of continuous or discrete numeric values, which acts as the matching criterion. Note that for consistency this vector should not belong to the Data
matrix.
For Lord and Raju methods, one can specify either the IRT model to be fitted (by means of model
, c
, engine
and discr
arguments), or the item parameter estimates with arguments irtParam
and same.scale
. See difLord
and difRaju
for further details.
The threshold for detecting DIF items depends on the method. For standardization it has to be fully specified (with the thr
argument), as well as for the TID method (through the thrTID
argument). For the other methods it is depending on the significance level set by alpha
.
For Mantel-Haenszel method, the DIF statistic can be either the Mantel-Haenszel chi-square statistic or the log odds-ratio statistic. The method is specified by the argument MHstat
, and the default value is "MHChisq"
for the chi-square statistic. Moreover, the option correct
specifies whether the continuity correction has to be applied to Mantel-Haenszel statistic. See difMH
for further details.
By default, the asymptotic Mantel-Haenszel statistic is computed. However, the exact statistics and related P-values can be obtained by specifying the logical argument exact
to TRUE
. See Agresti (1990, 1992) for further details about exact inference.
The weights for computing the standardized P-DIF statistics are defined through the argument stdWeight
, with possible values "focal"
(default value), "reference"
and "total"
. See stdPDIF
for further details.
For Breslow-Day method, two test statistics are available: the usual Breslow-Day statistic for testing homogeneous association (Aguerri, Galibert, Attorresi and Maranon, 2009) and the trend test statistic for assessing some monotonic trend in the odss ratios (Penfield, 2003). The DIF statistic is supplied by the BDstat
argument, with values "BD"
(default) for the usual statistic and "trend"
for the trend test statistic.
For logistic regression, the argument type
permits to test either both uniform and nonuniform effects simultaneously (type="both"
), only uniform DIF effect (type="udif"
) or only nonuniform DIF effect (type="nudif"
). The criterion
argument specifies the DIF statistic to be computed, either the likelihood ratio test statistic (by setting criterion="LRT"
) or the Wald test (by setting criterion="Wald"
). Moreover, the group membership can be either a vector of two distinct values, one for the reference group and one for the focal group, or a continuous or discrete variable that acts as the "group" membership variable. In the former case, the member.type
argument is set to "group"
and the focal.name
defines which value in the group
variable stands for the focal group. In the latter case, member.type
is set to "cont"
, focal.name
is ignored and each value of the group
represents one "group" of data (that is, the DIF effects are investigated among participants relying on different values of some discrete or continuous trait). See Logistik
for further details.
The SIBTEST method (Shealy and Stout, 1993) and its modified version, the Crossing-SIBTEST (Chalmers, 2018; Li and Stout, 1996) are returned by the difSIBTEST
function. SIBTEST method is returned when type
argument is set to "udif"
, while Crossing-SIBTEST is set with "nudif"
value for the type
argument. Note that type
takes the by-default value "both"
which is not allowed within the difSIBTEST
function; however, within this fucntion, keeping the by-default value yields selection of Crossing-SIBTEST.
The difSIBTEST
function is a wrapper to the SIBTEST
function from the mirt package (Chalmers, 2012) to fit within the difR
framework (Magis et al., 2010). Therefore, if you are using this function for publication purposes please cite Chalmers (2018; 2012) and Magis et al. (2010).
For Raju's method, the type of area (signed or unsigned) is fixed by the logical signed
argument, with default value FALSE
(i.e. unsigned areas). See RajuZ
for further details.
Item purification can be requested by specifying purify
option to TRUE
. Recall that item purification process is slightly different for IRT and for non-IRT based methods. See the corresponding methods for further information.
Adjustment for multiple comparisons is possible with the argument p.adjust.method
. See the corresponding methods for further information.
A pre-specified set of anchor items can be provided through the anchor
argument. For non-IRT methods, anchor items are used to compute the test score (as matching criterion). For IRT methods, anchor items are used to rescale the item parameters on a common metric. See the corresponding methods for further information. Note that anchor
argument is not working with "LRT"
method.
The output of the dichoDif
function can be stored in a text file by fixing save.output
and output
appropriately. See the help file of selectDif
function (or any other DIF method) for further information.
Value
Either the output of one of the DIF detection methods, or a list of class "dichoDif" with the following arguments:
DIF |
a character matrix with one row per item and whose columns refer to the different specified detection methods. See Details. |
props |
the value of the |
thrTID |
the value of the |
correct |
the value of |
exact |
the value of |
alpha |
the significance level |
MHstat |
the value of the |
stdWeight |
the value of the |
thrSTD |
the value of |
BDstat |
the value of the |
member.type |
the value of the |
match |
the value of the |
type |
the value of the |
criterion |
the value of the |
model |
the value of |
c |
the value of |
engine |
The value of the |
discr |
the value of the |
irtParam |
the value of |
same.scale |
the value of |
p.adjust.method |
the value of the |
purification |
the value of |
nrPur |
an integer vector (of length equal to the number of methods) with the number of iterations in the purification process.
Returned only if |
convergence |
a logical vector (of length equal to the number of methods) indicating whether the iterative purification process converged. Returned only if |
anchor.names |
the value of the |
save.output |
the value of the |
output |
the value of the |
Author(s)
Sebastien Beland
Collectif pour le Developpement et les Applications en Mesure et Evaluation (Cdame)
Universite du Quebec a Montreal
sebastien.beland.1@hotmail.com, http://www.cdame.uqam.ca/
David Magis
Department of Psychology, University of Liege
Research Group of Quantitative Psychology and Individual Differences, KU Leuven
David.Magis@uliege.be, http://ppw.kuleuven.be/okp/home/
Gilles Raiche
Collectif pour le Developpement et les Applications en Mesure et Evaluation (Cdame)
Universite du Quebec a Montreal
raiche.gilles@uqam.ca, http://www.cdame.uqam.ca/
References
Agresti, A. (1990). Categorical data analysis. New York: Wiley.
Agresti, A. (1992). A survey of exact inference for contingency tables. Statistical Science, 7, 131-177. doi: 10.1214/ss/1177011454
Aguerri, M.E., Galibert, M.S., Attorresi, H.F. and Maranon, P.P. (2009). Erroneous detection of nonuniform DIF using the Breslow-Day test in a short test. Quality and Quantity, 43, 35-44. doi: 10.1007/s11135-007-9130-2
Angoff, W. H., and Ford, S. F. (1973). Item-race interaction on a test of scholastic aptitude. Journal of Educational Measurement, 2, 95-106. doi: 10.1111/j.1745-3984.1973.tb00787.x
Chalmers, R. P. (2012). mirt: A Multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1-29. doi: 10.18637/jss.v048.i06
Chalmers, R. P. (2018). Improving the Crossing-SIBTEST statistic for detecting non-uniform DIF. Psychometrika, 83(2), 376–386. doi: 10.1007/s11336-017-9583-8
Dorans, N. J. and Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the Scholastic Aptitude Test. Journal of Educational Measurement, 23, 355-368. doi: 10.1111/j.1745-3984.1986.tb00255.x
Holland, P. W. and Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer and H. I. Braun (Dirs.), Test validity. Hillsdale, NJ: Lawrence Erlbaum Associates.
Li, H.-H., and Stout, W. (1996). A new procedure for detection of crossing DIF. Psychometrika, 61, 647–677. doi: 10.1007/BF02294041
Lord, F. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Associates.
Magis, D., Beland, S., Tuerlinckx, F. and De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42, 847-862. doi: 10.3758/BRM.42.3.847
Penfield, R.D. (2003). Application of the Breslow-Day test of trend in odds ratio heterogeneity to the detection of nonuniform DIF. Alberta Journal of Educational Research, 49, 231-243.
Raju, N. S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14, 197-207. doi: 10.1177/014662169001400208
Shealy, R. and Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detect test bias/DTF as well as item bias/DIF. Psychometrika, 58, 159-194. doi: 10.1007/BF02294572
Swaminathan, H. and Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361-370. doi: 10.1111/j.1745-3984.1990.tb00754.x
Thissen, D., Steinberg, L. and Wainer, H. (1988). Use of item response theory in the study of group difference in trace lines. In H. Wainer and H. Braun (Eds.), Test validity. Hillsdale, NJ: Lawrence Erlbaum Associates.
See Also
difTID
, difMH
, difStd
, difBD
, difLogistic
, difSIBTEST
, difLord
, difRaju
,
difLRT
Examples
## Not run:
# Loading of the verbal data
data(verbal)
attach(verbal)
# Excluding the "Anger" variable
verbal <- verbal[colnames(verbal)!="Anger"]
# Comparing TID, Mantel-Haenszel, standardization; logistic regression and SIBTEST
# TID threshold 1.0
# Standardization threshold 0.08
# no continuity correction,
# with item purification
# both types of DIF effect for logistic regression
# CSIBTEST method
dichoDif(verbal, group = 25, focal.name = 1, method = c("TID", "MH", "Std",
"Logistic", "SIBTEST"), correct = FALSE, thrSTD = 0.08, thrTID = 1, purify = TRUE)
# Same analysis, but using items 1 to 5 as anchor and saving the output into
# the 'dicho' file
dichoDif(verbal, group = 25, focal.name = 1, method = c("TID", "MH", "Std",
"Logistic"), correct = FALSE, thrSTD = 0.08, thrTID = 1, purify = TRUE,
anchor = 1:5,save.output = TRUE, output = c("dicho", "default"))
# Comparing Lord and Raju results with 2PL model and
# with item purification
dichoDif(verbal, group = 25, focal.name = 1, method = c("Lord", "Raju"),
model = "2PL", purify = TRUE)
## End(Not run)