R: Generalized IRT residual-based DIF detection framework for...

grdif {irtQ}

R Documentation

Generalized IRT residual-based DIF detection framework for multiple groups (GRDIF)

Description

This function computes three GRDIF statistics, GRDIF_{R}, GRDIF_{S}, and GRDIF_{RS}, for analyzing differential item functioning (DIF) among multiple groups (Lim, Zhu, Choe, & Han, 2023). They are specialized to capture uniform DIF, nonuniform DIF, and mixed DIF, respectively.

Usage

grdif(x, ...)

## Default S3 method:
grdif(
  x,
  data,
  score = NULL,
  group,
  focal.name,
  D = 1,
  alpha = 0.05,
  missing = NA,
  purify = FALSE,
  purify.by = c("grdifrs", "grdifr", "grdifs"),
  max.iter = 10,
  min.resp = NULL,
  post.hoc = TRUE,
  method = "ML",
  range = c(-4, 4),
  norm.prior = c(0, 1),
  nquad = 41,
  weights = NULL,
  ncore = 1,
  verbose = TRUE,
  ...
)

## S3 method for class 'est_irt'
grdif(
  x,
  score = NULL,
  group,
  focal.name,
  alpha = 0.05,
  missing = NA,
  purify = FALSE,
  purify.by = c("grdifrs", "grdifr", "grdifs"),
  max.iter = 10,
  min.resp = NULL,
  post.hoc = TRUE,
  method = "ML",
  range = c(-4, 4),
  norm.prior = c(0, 1),
  nquad = 41,
  weights = NULL,
  ncore = 1,
  verbose = TRUE,
  ...
)

## S3 method for class 'est_item'
grdif(
  x,
  group,
  focal.name,
  alpha = 0.05,
  missing = NA,
  purify = FALSE,
  purify.by = c("grdifrs", "grdifr", "grdifs"),
  max.iter = 10,
  min.resp = NULL,
  post.hoc = TRUE,
  method = "ML",
  range = c(-4, 4),
  norm.prior = c(0, 1),
  nquad = 41,
  weights = NULL,
  ncore = 1,
  verbose = TRUE,
  ...
)

Arguments

`x`	A data frame containing item metadata (e.g., item parameters, number of categories, models, etc.), an object of class `est_item` obtained from the function `est_item`, or an object of class `est_irt` obtained from the function `est_irt`. The item metadata can be easily created using the function `shape_df`. See `est_irt`, `irtfit`, `info` or `simdat` for more details about the item metadata.
`...`	Additional arguments that will be forwarded to the `est_score` function.
`data`	A matrix containing examinees' response data for items in `x`. Rows and columns represent examinees and items, respectively.
`score`	A vector of examinees' ability estimates. If abilities are not provided, `grdif` estimates abilities before computing GRDIF statistics. See `est_score` for more details about scoring methods. Default is NULL.
`group`	A numeric or character vector indicating group membership of examinees. The length of the vector should be the same as the number of rows in the response data matrix.
`focal.name`	A character or numeric vector representing levels associated with focal groups. For instance, consider `group = c(0, 0, 1, 2, 2, 3, 3)` where '1', '2', and '3' indicate three distinct focal groups, and '0' represents the reference group. In this case, set `focal.name = c(1, 2, 3)`.
`D`	A scaling factor in IRT models to make the logistic function as close as possible to the normal ogive function (if set to 1.7). Default is 1.
`alpha`	A numeric value to specify the significance `\alpha`-level of the hypothesis test using GRDIF statistics. Default is .05.
`missing`	A value indicating missing values in the response data set. Default is NA.
`purify`	A logical value indicating whether a purification process will be implemented or not. Default is FALSE.
`purify.by`	A character string specifying a GRDIF statistic with which the purification is implemented. Available statistics are "grdifrs" for `GRDIF_{RS}`, "grdifr" for `GRDIF_{R}`, and "grdifs" for `GRDIF_{S}`.
`max.iter`	A positive integer value specifying the maximum number of iterations for the purification process. Default is 10.
`min.resp`	A positive integer value specifying the minimum number of item responses for an examinee required to compute the ability estimate. Default is NULL. See details below for more information.
`post.hoc`	A logical value indicating whether to conduct a post-hoc RDIF analysis for all possible combinations of paired groups for statistically flagged items. The default is TRUE. See below for more details.
`method`	A character string indicating a scoring method. Available methods are "ML" for maximum likelihood estimation, "WL" for the weighted likelihood estimation, "MAP" for maximum a posteriori estimation, and "EAP" for expected a posteriori estimation. The default method is "ML".
`range`	A numeric vector with two components to restrict the ability scale range for ML, WL, EAP, and MAP scoring methods. The default is c(-5, 5).
`norm.prior`	A numeric vector with two components specifying the mean and standard deviation of the normal prior distribution. These parameters are used to obtain Gaussian quadrature points and their corresponding weights from the normal distribution. The default is c(0,1). Ignored if `method` is "ML" or "WL".
`nquad`	An integer value specifying the number of Gaussian quadrature points from the normal prior distribution. The default is 41. Ignored if `method` is "ML", "WL", or "MAP".
`weights`	A two-column matrix or data frame containing the quadrature points (in the first column) and their corresponding weights (in the second column) for the latent variable prior distribution. The weights and quadrature points can be obtained using the `gen.weight` function. If NULL and `method` is "EAP", default values are used (see the `norm.prior` and `nquad` arguments). Ignored if `method` is "ML", "WL", or "MAP".
`ncore`	The number of logical CPU cores to use. The default is 1. See `est_score` for details.
`verbose`	A logical value. If TRUE, progress messages for the purification procedure are suppressed. The default is TRUE.

Details

The GRDIF framework (Lim et al., 2023) is a generalized version of the RDIF detection framework, designed to assess DIF for multiple groups. The GRDIF framework comprises three statistics: GRDIF_{R}, GRDIF_{S}, and GRDIF_{RS}, which focus on detecting uniform, nonuniform, and mixed DIF, respectively. Under the null hypothesis that a test contains no DIF items, GRDIF_{R}, GRDIF_{S}, and GRDIF_{RS} asymptotically follow the \chi^{2} distributions with G-1, G-1, and 2(G-1) degrees of freedom, respectively, where G represents the total number of groups being compared. For more information on the GRDIF framework, see Lim et al. (2023).

The grdif function calculates all three GRDIF statistics: GRDIF_{R}, GRDIF_{S}, and GRDIF_{RS}. The current version of the grdif function supports both dichotomous and polytomous item response data. To compute these statistics, the grdif function requires (1) item parameter estimates obtained from aggregate data, regardless of group membership, (2) examinees' ability estimates (e.g., MLE), and (3) examinees' item response data. Note that the ability estimates must be computed using the aggregate data-based item parameter estimates. The item parameter estimates should be provided in the x argument, the ability estimates in the score argument, and the response data in the data argument. When abilities are not given in the score argument (i.e., score = NULL), the grdif function estimates examinees' abilities automatically using the scoring method specified in the method argument (e.g., method = "ML").

The group argument accepts a vector with numeric or character values, indicating the group membership of examinees. The vector may include multiple distinct values, where one value represents the reference group and the others represent the focal groups. The length of the vector should be the same as the number of rows in the response data, with each value indicating the group membership of each examinee. After specifying the group, a numeric or character vector should be provided in the focal.name argument to define which group values in the group argument represent the focal groups. The reference group will be the group not included in the focal.name vector.

Similar to the original RDIF framework for two-groups comparison, the GRDIF framework can implement an iterative purification process. When purify = TRUE, the purification process is executed based on one of the GRDIF statistics specified in the purify.by argument (e.g., purify.by="grdifrs"). During each iterative purification, examinees' latent abilities are calculated using purified items and the scoring method specified in the method argument. The iterative purification process stops when no additional DIF items are identified or when the process reaches a predetermined limit of iterations, which can be set in the max.iter argument. For more information about the purification procedure, refer to Lim et al. (2022).

Scoring with a limited number of items can result in large standard errors, which may impact the effectiveness of DIF detection within the GRDIF framework. The min.resp argument can be employed to avoid using scores with significant standard errors when calculating the GRDIF statistics, particularly during the purification process. For instance, if min.resp is not NULL (e.g., min.resp=5), item responses from examinees whose total item responses fall below the specified minimum number are treated as missing values (i.e., NA). Consequently, their ability estimates become missing values and are not utilized in computing the GRDIF statistics. If min.resp=NULL, an examinee's score will be computed as long as there is at least one item response for the examinee.

The post.hoc argument allows you to perform a post-hoc RDIF analysis for all possible combinations of paired groups for items flagged as statistically significant. For example, consider four groups of examinees: A, B, C, and D. If post.hoc = TRUE, the grdif function will perform a post-hoc RDIF analysis for all possible pairs of groups (A-B, A-C, A-D, B-C, B-D, and C-D) for each flagged item. This helps to identify which specific pairs of groups have DIF for each item, providing a more detailed understanding of the DIF patterns in the data. Note that when purification is implemented (i.e., purify = TRUE), the post-hoc RDIF analysis is conducted for each flagged item during each single iteration of the purification process.

Value

This function returns a list of four internal objects. The four objects are:

`no_purify`	A list of several sub-objects containing the results of DIF analysis without a purification procedure. The sub-objects are: dif_stat A data frame containing the results of three RDIF statistics for all evaluated items. Starting from the first column, each column represents the item's ID, `GRDIF_{R}` statistic, `GRDIF_{S}` statistic, `GRDIF_{RS}` statistic, p-value of `GRDIF_{R}`, p-value of `GRDIF_{S}`, p-value of `GRDIF_{RS}`, sample size of the reference group, sample sizes of the focal groups, and the total sample size, respectively. moments A list of three data frames detailing the moments of mean raw residuals (MRRs) and mean squared residuals (MSRs) across all compared groups. The first data frame contains the means of MRR and MSR, the second data frame includes the variances of MRR and MSR, and the last one displays the covariances of MRR and MSR for all groups. dif_item A list of three numeric vectors indicating potential DIF items flagged by each of the GRDIF statistics. Each numeric vector corresponds to the items identified by `GRDIF_{R}`, `GRDIF_{S}`, and `GRDIF_{RS}`, respectively. score A vector of ability estimates used to compute the GRDIF statistics. post.hoc A list of three data frames containing the post-hoc RDIF analysis results of all possible combinations of paired groups. The first, second, and third data frames present the post-hoc analysis outcomes for the items identified by the `RDIF_{R}`, `RDIF_{S}`, and `RDIF_{RS}` statistics, respectively.
`purify`	A logical value indicating whether the purification process was used.
`with_purify`	A list of several sub-objects containing the results of DIF analysis with a purification procedure. The sub-objects are: purify.by A character string indicating which GRDIF statistic is used for the purification. "grdifr", "grdifs", and "grdifrs" refers to `GRDIF_{R}`, `GRDIF_{S}`, and `GRDIF_{RS}`, respectively. dif_stat A data frame containing the results of three GRDIF statistics for all evaluated items. Starting from the first column, each column represents the item's ID, `GRDIF_{R}` statistic, `GRDIF_{S}` statistic, `GRDIF_{RS}` statistic, p-value of `GRDIF_{R}`, p-value of `GRDIF_{S}`, p-value of `GRDIF_{RS}`, sample size of the reference group, sample sizes of the focal groups, total sample size, and nth iteration where the GRDIF statistics were computed, respectively. moments A list of three data frames detailing the moments of mean raw residuals (MRRs) and mean squared residuals (MSRs) across all compared groups. The first data frame contains the means of MRR and MSR, the second data frame includes the variances of MRR and MSR, and the last one displays the covariances of MRR and MSR for all groups. In each data frame, the last column indicates the nth iteration where the GRDIF statistics were computed. n.iter The total number of iterations implemented for the purification. score A vector of final purified ability estimates used to compute the GRDIF statistics. post.hoc A data frame containing the post-hoc RDIF analysis results for the flagged items across all possible combinations of paired groups. The post-hoc RDIF analysis is conducted for each flagged item at every iteration. complete A logical value indicating whether the purification process was completed. If FALSE, it means that the purification process reached the maximum iteration number but was not completed.
`alpha`	A significance `\alpha`-level used to compute the p-values of RDIF statistics.

Methods (by class)

default: Default method to computes three GRDIF statistics with multiple group data using a data frame x containing the item metadata.
est_irt: An object created by the function est_irt.
est_item: An object created by the function est_item.

Author(s)

Hwanggyu Lim hglim83@gmail.com

References

Lim, H., & Choe, E. M. (2023). Detecting differential item functioning in CAT using IRT residual DIF approach. Journal of Educational Measurement. doi:10.1111/jedm.12366.

Lim, H., Choe, E. M., & Han, K. T. (2022). A residual-based differential item functioning detection framework in item response theory. Journal of Educational Measurement, 59(1), 80-104. doi:10.1111/jedm.12313.

Lim, H., Zhu, D., Choe, E. M., & Han, K. T. (2023, April). Detecting differential item functioning among multiple groups using IRT residual DIF framework. Paper presented at the Annual Meeting of the National Council on Measurement in Education. Chicago, IL.

Examples


# load library
library("dplyr")

## Uniform DIF detection for four groups (1R/3F)
########################################################
# (1) Manipulate uniform DIF for all three focal groups
########################################################
# Import the "-prm.txt" output file from flexMIRT
flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ")

# Select 36 of 3PLM items which are non-DIF items
par_nstd <-
  bring.flexmirt(file=flex_sam, "par")$Group1$full_df %>%
  dplyr::filter(.data$model == "3PLM") %>%
  dplyr::filter(dplyr::row_number() %in% 1:36) %>%
  dplyr::select(1:6)
par_nstd$id <- paste0("nondif", 1:36)

# Generate four new items where uniform DIF will be manipulated
difpar_ref <-
  shape_df(par.drm=list(a=c(0.8, 1.5, 0.8, 1.5), b=c(0.0, 0.0, -0.5, -0.5), g=.15),
           item.id=paste0("dif", 1:4), cats=2, model="3PLM")

# Manipulate uniform DIF on the four new items by adjusting the b-parameters
# for the three focal groups
difpar_foc1 <-
  difpar_ref %>%
  dplyr::mutate_at(.vars="par.2", .funs=function(x) x + c(0.7, 0.7, 0, 0))
difpar_foc2 <-
  difpar_ref %>%
  dplyr::mutate_at(.vars="par.2", .funs=function(x) x + c(0, 0, 0.7, 0.7))
difpar_foc3 <-
  difpar_ref %>%
  dplyr::mutate_at(.vars="par.2", .funs=function(x) x + c(-0.4, -0.4, -0.5, -0.5))

# Combine the 4 DIF and 36 non-DIF item data for both reference and focal groups.
# Thus, the first four items have uniform DIF for thee three focal groups
par_ref <- rbind(difpar_ref, par_nstd)
par_foc1 <- rbind(difpar_foc1, par_nstd)
par_foc2 <- rbind(difpar_foc2, par_nstd)
par_foc3 <- rbind(difpar_foc3, par_nstd)

# Generate the true thetas from the different ability distributions
set.seed(128)
theta_ref <- rnorm(500, 0.0, 1.0)
theta_foc1 <- rnorm(500, -1.0, 1.0)
theta_foc2 <- rnorm(500, 1.0, 1.0)
theta_foc3 <- rnorm(500, 0.5, 1.0)

# Generate the response data
resp_ref <- irtQ::simdat(par_ref, theta=theta_ref, D=1)
resp_foc1 <- irtQ::simdat(par_foc1, theta=theta_foc1, D=1)
resp_foc2 <- irtQ::simdat(par_foc2, theta=theta_foc2, D=1)
resp_foc3 <- irtQ::simdat(par_foc3, theta=theta_foc3, D=1)
data <- rbind(resp_ref, resp_foc1, resp_foc2, resp_foc3)

########################################################
# (2) Estimate the item and ability parameters
#     using the aggregate data
########################################################
# Estimate the item parameters
est_mod <- irtQ::est_irt(data=data, D=1, model="3PLM")
est_par <- est_mod$par.est

# Estimate the ability parameters using MLE
score <- irtQ::est_score(x=est_par, data=data, method="ML")$est.theta

########################################################
# (3) Conduct DIF analysis
########################################################
# Create a vector of group membership indicators,
# where 1, 2 and 3 indicate the three focal groups
group <- c(rep(0, 500), rep(1, 500), rep(2, 500), rep(3, 500))

# (a) Compute GRDIF statistics without purification
#     and implement the post-hoc two-groups comparison analysis for
#     the flagged items
dif_nopuri <- grdif(x=est_par, data=data, score=score, group=group,
                    focal.name=c(1, 2, 3), D=1, alpha=0.05,
                    purify=FALSE, post.hoc=TRUE)
print(dif_nopuri)

# Print the post-hoc analysis results for the fagged items
print(dif_nopuri$no_purify$post.hoc)

# (b) Compute GRDIF statistics with purification
#     based on \eqn{GRDIF_{R}} and implement the post-hoc
#     two-groups comparison analysis for flagged items
dif_puri_r <- grdif(x=est_par, data=data, score=score, group=group,
                    focal.name=c(1, 2, 3), D=1, alpha=0.05,
                    purify=TRUE, purify.by = "grdifr", post.hoc=TRUE)
print(dif_puri_r)

# Print the post-hoc analysis results without purification
print(dif_puri_r$no_purify$post.hoc)

# Print the post-hoc analysis results with purification
print(dif_puri_r$with_purify$post.hoc)

[Package irtQ version 0.2.0 Index]