grdif {irtQ} | R Documentation |
Generalized IRT residual-based DIF detection framework for multiple groups (GRDIF)
Description
This function computes three GRDIF statistics, GRDIF_{R}
, GRDIF_{S}
,
and GRDIF_{RS}
, for analyzing differential item functioning (DIF) among multiple groups
(Lim, Zhu, Choe, & Han, 2023). They are specialized to capture uniform DIF, nonuniform DIF, and
mixed DIF, respectively.
Usage
grdif(x, ...)
## Default S3 method:
grdif(
x,
data,
score = NULL,
group,
focal.name,
D = 1,
alpha = 0.05,
missing = NA,
purify = FALSE,
purify.by = c("grdifrs", "grdifr", "grdifs"),
max.iter = 10,
min.resp = NULL,
post.hoc = TRUE,
method = "ML",
range = c(-4, 4),
norm.prior = c(0, 1),
nquad = 41,
weights = NULL,
ncore = 1,
verbose = TRUE,
...
)
## S3 method for class 'est_irt'
grdif(
x,
score = NULL,
group,
focal.name,
alpha = 0.05,
missing = NA,
purify = FALSE,
purify.by = c("grdifrs", "grdifr", "grdifs"),
max.iter = 10,
min.resp = NULL,
post.hoc = TRUE,
method = "ML",
range = c(-4, 4),
norm.prior = c(0, 1),
nquad = 41,
weights = NULL,
ncore = 1,
verbose = TRUE,
...
)
## S3 method for class 'est_item'
grdif(
x,
group,
focal.name,
alpha = 0.05,
missing = NA,
purify = FALSE,
purify.by = c("grdifrs", "grdifr", "grdifs"),
max.iter = 10,
min.resp = NULL,
post.hoc = TRUE,
method = "ML",
range = c(-4, 4),
norm.prior = c(0, 1),
nquad = 41,
weights = NULL,
ncore = 1,
verbose = TRUE,
...
)
Arguments
x |
A data frame containing item metadata (e.g., item parameters, number of categories, models, etc.),
an object of class |
... |
Additional arguments that will be forwarded to the |
data |
A matrix containing examinees' response data for items in |
score |
A vector of examinees' ability estimates. If abilities are not provided, |
group |
A numeric or character vector indicating group membership of examinees. The length of the vector should be the same as the number of rows in the response data matrix. |
focal.name |
A character or numeric vector representing levels associated with focal groups.
For instance, consider |
D |
A scaling factor in IRT models to make the logistic function as close as possible to the normal ogive function (if set to 1.7). Default is 1. |
alpha |
A numeric value to specify the significance |
missing |
A value indicating missing values in the response data set. Default is NA. |
purify |
A logical value indicating whether a purification process will be implemented or not. Default is FALSE. |
purify.by |
A character string specifying a GRDIF statistic with which the purification
is implemented. Available statistics are "grdifrs" for |
max.iter |
A positive integer value specifying the maximum number of iterations for the purification process. Default is 10. |
min.resp |
A positive integer value specifying the minimum number of item responses for an examinee required to compute the ability estimate. Default is NULL. See details below for more information. |
post.hoc |
A logical value indicating whether to conduct a post-hoc RDIF analysis for all possible combinations of paired groups for statistically flagged items. The default is TRUE. See below for more details. |
method |
A character string indicating a scoring method. Available methods are "ML" for maximum likelihood estimation, "WL" for the weighted likelihood estimation, "MAP" for maximum a posteriori estimation, and "EAP" for expected a posteriori estimation. The default method is "ML". |
range |
A numeric vector with two components to restrict the ability scale range for ML, WL, EAP, and MAP scoring methods. The default is c(-5, 5). |
norm.prior |
A numeric vector with two components specifying the mean and standard deviation of
the normal prior distribution. These parameters are used to obtain Gaussian quadrature points
and their corresponding weights from the normal distribution. The default is c(0,1). Ignored if |
nquad |
An integer value specifying the number of Gaussian quadrature points from the normal
prior distribution. The default is 41. Ignored if |
weights |
A two-column matrix or data frame containing the quadrature points (in the first column)
and their corresponding weights (in the second column) for the latent variable prior distribution.
The weights and quadrature points can be obtained using the |
ncore |
The number of logical CPU cores to use. The default is 1. See |
verbose |
A logical value. If TRUE, progress messages for the purification procedure are suppressed. The default is TRUE. |
Details
The GRDIF framework (Lim et al., 2023) is a generalized version of the RDIF detection framework,
designed to assess DIF for multiple groups. The GRDIF framework comprises three statistics: GRDIF_{R}
, GRDIF_{S}
,
and GRDIF_{RS}
, which focus on detecting uniform, nonuniform, and mixed DIF, respectively.
Under the null hypothesis that a test contains no DIF items, GRDIF_{R}
, GRDIF_{S}
, and GRDIF_{RS}
asymptotically follow the \chi^{2}
distributions with G-1, G-1, and 2(G-1) degrees of freedom, respectively,
where G represents the total number of groups being compared. For more information on the GRDIF framework, see Lim et al. (2023).
The grdif
function calculates all three GRDIF statistics: GRDIF_{R}
, GRDIF_{S}
, and GRDIF_{RS}
. The current
version of the grdif
function supports both dichotomous and polytomous item response data. To compute these statistics, the grdif
function requires (1) item parameter estimates obtained from aggregate data, regardless of group membership, (2) examinees' ability estimates
(e.g., MLE), and (3) examinees' item response data. Note that the ability estimates must be computed using the aggregate data-based
item parameter estimates. The item parameter estimates should be provided in the x
argument, the ability estimates in the
score
argument, and the response data in the data
argument. When abilities are not given in the score
argument
(i.e., score = NULL
), the grdif
function estimates examinees' abilities automatically using the scoring method
specified in the method
argument (e.g., method = "ML"
).
The group
argument accepts a vector with numeric or character values, indicating the group membership of examinees.
The vector may include multiple distinct values, where one value represents the reference group and the others represent the focal groups.
The length of the vector should be the same as the number of rows in the response data, with each value indicating the group membership
of each examinee. After specifying the group
, a numeric or character vector should be provided in the focal.name
argument
to define which group values in the group
argument represent the focal groups. The reference group will be the group not included
in the focal.name
vector.
Similar to the original RDIF framework for two-groups comparison, the GRDIF framework can implement an iterative purification process.
When purify = TRUE
, the purification process is executed based on one of the GRDIF statistics specified in the purify.by
argument (e.g., purify.by="grdifrs"
). During each iterative purification, examinees' latent abilities are calculated using purified
items and the scoring method specified in the method
argument. The iterative purification process stops when no additional DIF items
are identified or when the process reaches a predetermined limit of iterations, which can be set in the max.iter
argument.
For more information about the purification procedure, refer to Lim et al. (2022).
Scoring with a limited number of items can result in large standard errors, which may impact the effectiveness of DIF detection within
the GRDIF framework. The min.resp
argument can be employed to avoid using scores with significant standard errors when calculating
the GRDIF statistics, particularly during the purification process. For instance, if min.resp
is not NULL (e.g., min.resp=5
),
item responses from examinees whose total item responses fall below the specified minimum number are treated as missing values (i.e., NA).
Consequently, their ability estimates become missing values and are not utilized in computing the GRDIF statistics. If min.resp=NULL
,
an examinee's score will be computed as long as there is at least one item response for the examinee.
The post.hoc
argument allows you to perform a post-hoc RDIF analysis for all possible combinations of paired groups
for items flagged as statistically significant. For example, consider four groups of examinees: A, B, C, and D. If post.hoc = TRUE
,
the grdif
function will perform a post-hoc RDIF analysis for all possible pairs of groups
(A-B, A-C, A-D, B-C, B-D, and C-D) for each flagged item. This helps to identify which specific pairs of groups
have DIF for each item, providing a more detailed understanding of the DIF patterns in the data. Note that when purification is implemented
(i.e., purify = TRUE
), the post-hoc RDIF analysis is conducted for each flagged item during each single iteration of
the purification process.
Value
This function returns a list of four internal objects. The four objects are:
no_purify |
A list of several sub-objects containing the results of DIF analysis without a purification procedure. The sub-objects are:
|
purify |
A logical value indicating whether the purification process was used. |
with_purify |
A list of several sub-objects containing the results of DIF analysis with a purification procedure. The sub-objects are:
|
alpha |
A significance |
Methods (by class)
-
default
: Default method to computes three GRDIF statistics with multiple group data using a data framex
containing the item metadata. -
est_irt
: An object created by the functionest_irt
. -
est_item
: An object created by the functionest_item
.
Author(s)
Hwanggyu Lim hglim83@gmail.com
References
Lim, H., & Choe, E. M. (2023). Detecting differential item functioning in CAT using IRT residual DIF approach. Journal of Educational Measurement. doi:10.1111/jedm.12366.
Lim, H., Choe, E. M., & Han, K. T. (2022). A residual-based differential item functioning detection framework in item response theory. Journal of Educational Measurement, 59(1), 80-104. doi:10.1111/jedm.12313.
Lim, H., Zhu, D., Choe, E. M., & Han, K. T. (2023, April). Detecting differential item functioning among multiple groups using IRT residual DIF framework. Paper presented at the Annual Meeting of the National Council on Measurement in Education. Chicago, IL.
See Also
rdif
est_item
, info
, simdat
, shape_df
,
gen.weight
, est_score
Examples
# load library
library("dplyr")
## Uniform DIF detection for four groups (1R/3F)
########################################################
# (1) Manipulate uniform DIF for all three focal groups
########################################################
# Import the "-prm.txt" output file from flexMIRT
flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ")
# Select 36 of 3PLM items which are non-DIF items
par_nstd <-
bring.flexmirt(file=flex_sam, "par")$Group1$full_df %>%
dplyr::filter(.data$model == "3PLM") %>%
dplyr::filter(dplyr::row_number() %in% 1:36) %>%
dplyr::select(1:6)
par_nstd$id <- paste0("nondif", 1:36)
# Generate four new items where uniform DIF will be manipulated
difpar_ref <-
shape_df(par.drm=list(a=c(0.8, 1.5, 0.8, 1.5), b=c(0.0, 0.0, -0.5, -0.5), g=.15),
item.id=paste0("dif", 1:4), cats=2, model="3PLM")
# Manipulate uniform DIF on the four new items by adjusting the b-parameters
# for the three focal groups
difpar_foc1 <-
difpar_ref %>%
dplyr::mutate_at(.vars="par.2", .funs=function(x) x + c(0.7, 0.7, 0, 0))
difpar_foc2 <-
difpar_ref %>%
dplyr::mutate_at(.vars="par.2", .funs=function(x) x + c(0, 0, 0.7, 0.7))
difpar_foc3 <-
difpar_ref %>%
dplyr::mutate_at(.vars="par.2", .funs=function(x) x + c(-0.4, -0.4, -0.5, -0.5))
# Combine the 4 DIF and 36 non-DIF item data for both reference and focal groups.
# Thus, the first four items have uniform DIF for thee three focal groups
par_ref <- rbind(difpar_ref, par_nstd)
par_foc1 <- rbind(difpar_foc1, par_nstd)
par_foc2 <- rbind(difpar_foc2, par_nstd)
par_foc3 <- rbind(difpar_foc3, par_nstd)
# Generate the true thetas from the different ability distributions
set.seed(128)
theta_ref <- rnorm(500, 0.0, 1.0)
theta_foc1 <- rnorm(500, -1.0, 1.0)
theta_foc2 <- rnorm(500, 1.0, 1.0)
theta_foc3 <- rnorm(500, 0.5, 1.0)
# Generate the response data
resp_ref <- irtQ::simdat(par_ref, theta=theta_ref, D=1)
resp_foc1 <- irtQ::simdat(par_foc1, theta=theta_foc1, D=1)
resp_foc2 <- irtQ::simdat(par_foc2, theta=theta_foc2, D=1)
resp_foc3 <- irtQ::simdat(par_foc3, theta=theta_foc3, D=1)
data <- rbind(resp_ref, resp_foc1, resp_foc2, resp_foc3)
########################################################
# (2) Estimate the item and ability parameters
# using the aggregate data
########################################################
# Estimate the item parameters
est_mod <- irtQ::est_irt(data=data, D=1, model="3PLM")
est_par <- est_mod$par.est
# Estimate the ability parameters using MLE
score <- irtQ::est_score(x=est_par, data=data, method="ML")$est.theta
########################################################
# (3) Conduct DIF analysis
########################################################
# Create a vector of group membership indicators,
# where 1, 2 and 3 indicate the three focal groups
group <- c(rep(0, 500), rep(1, 500), rep(2, 500), rep(3, 500))
# (a) Compute GRDIF statistics without purification
# and implement the post-hoc two-groups comparison analysis for
# the flagged items
dif_nopuri <- grdif(x=est_par, data=data, score=score, group=group,
focal.name=c(1, 2, 3), D=1, alpha=0.05,
purify=FALSE, post.hoc=TRUE)
print(dif_nopuri)
# Print the post-hoc analysis results for the fagged items
print(dif_nopuri$no_purify$post.hoc)
# (b) Compute GRDIF statistics with purification
# based on \eqn{GRDIF_{R}} and implement the post-hoc
# two-groups comparison analysis for flagged items
dif_puri_r <- grdif(x=est_par, data=data, score=score, group=group,
focal.name=c(1, 2, 3), D=1, alpha=0.05,
purify=TRUE, purify.by = "grdifr", post.hoc=TRUE)
print(dif_puri_r)
# Print the post-hoc analysis results without purification
print(dif_puri_r$no_purify$post.hoc)
# Print the post-hoc analysis results with purification
print(dif_puri_r$with_purify$post.hoc)