R: CATSIB DIF detection procedure

catsib {irtQ}

R Documentation

CATSIB DIF detection procedure

Description

This function analyzes DIF on an item using CATSIB procedure (Nandakumar & Roussos, 2004), which is a modified version of SIBTEST (Shealy & Stout, 1993). The CATSIB procedure can be applied to a computerized adaptive testing (CAT) environment for differential item functioning (DIF) detection. In CATSIB, examinees are matched on IRT-based ability estimates adjusted by employing a regression correction method (Shealy & Stout, 1993) to reduce a statistical bias of the CATSIB statistic due to impact.

Usage

catsib(
  x = NULL,
  data,
  score = NULL,
  se = NULL,
  group,
  focal.name,
  D = 1,
  n.bin = c(80, 10),
  min.binsize = 3,
  max.del = 0.075,
  weight.group = c("comb", "foc", "ref"),
  alpha = 0.05,
  missing = NA,
  purify = FALSE,
  max.iter = 10,
  min.resp = NULL,
  method = "ML",
  range = c(-5, 5),
  norm.prior = c(0, 1),
  nquad = 41,
  weights = NULL,
  ncore = 1,
  verbose = TRUE,
  ...
)

Arguments

`x`	A data frame containing the item metadata (e.g., item parameters, number of categories, models ...). `x` should to be provided to estimate latent ability parameters when `score = NULL` or `purify = TRUE`. Default is NULL. See `est_irt`, `irtfit`, `info` or `simdat` for more detail about the item metadata.
`data`	A matrix containing examinees' response data of the items in the argument `x`. A row and column indicate the examinees and items, respectively.
`score`	A vector of examinees' ability estimates. If the abilities are not provided (i.e., `score = NULL`), `catsib` computes the ability estimates before computing the CATSIB statistics. See `est_score` for more detail about scoring methods. Default is NULL.
`se`	A vector of the standard errors of the ability estimates. The standard errors should be ordered in accordance with the order of the ability estimates specified in the `score` argument. Default is NULL.
`group`	A numeric or character vector indicating group membership of examinees. The length of vector should be the same with the number of rows in the response data matrix.
`focal.name`	A single numeric or character scalar representing the level associated with the focal group. For instance, given `group = c(0, 1, 0, 1, 1)` and '1' indicating the focal group, set `focal.name = 1`.
`D`	A scaling factor in IRT models to make the logistic function as close as possible to the normal ogive function (if set to 1.7). Default is 1.
`n.bin`	A vector of two positive integers to set the maximum and minimum numbers of bins (or intervals) on the ability scale. The first and second values indicate the maximum and minimum numbers of the bins, respectively. See below for more detail.
`min.binsize`	A positive integer value to set the minimum size of each bin. To ensure stable statistical estimation, each bin is required to have a certain number of examinees (e.g, 3), at least, from both reference and focal groups if it was to be included in calculation of `\hat{\beta}`. All bins with fewer than the minimum number are not used for the computation. Default is 3. See below for more detail.
`max.del`	A numerical value to set the maximum permissible proportion of examinees to be deleted from either reference group or focal group when automatically determining the number of bins on the ability scale. Default is 0.075. See below for more detail.
`weight.group`	A single character string to specify a target ability distribution over which the expectation of DIF measure, called `\hat{\beta}`, and the corresponding standard error are computed. Available options are "comb" for the combined ability distribution from both the reference and focal groups, "foc" for the ability distribution of the focal group, and "ref" for the ability distribution of the reference group. Defulat is "comb". See below for more detail.
`alpha`	A numeric value to specify significance `\alpha`-level of the hypothesis test using the CATSIB statistics. Default is .05.
`missing`	A value indicating missing values in the response data set. Default is NA.
`purify`	A logical value indicating whether a purification process will be implemented or not. Default is FALSE. See below for more detail.
`max.iter`	A positive integer value specifying the maximum number of iterations for the purification process. Default is 10.
`min.resp`	A positive integer value specifying the minimum number of item responses for an examinee required to compute the ability estimate. Default is NULL. See details below for more information.
`method`	A character string indicating a scoring method. Available methods are "ML" for the maximum likelihood estimation, "WL" for the weighted likelihood estimation, "MAP" for the maximum a posteriori estimation, and "EAP" for the expected a posteriori estimation. Default method is "ML".
`range`	A numeric vector of two components to restrict the range of ability scale for the ML, WL, MLF, and MAP scoring methods. Default is c(-5, 5).
`norm.prior`	A numeric vector of two components specifying a mean and standard deviation of the normal prior distribution. These two parameters are used to obtain the gaussian quadrature points and the corresponding weights from the normal distribution. Default is c(0,1). Ignored if `method` is "ML" or "WL".
`nquad`	An integer value specifying the number of gaussian quadrature points from the normal prior distribution. Default is 41. Ignored if `method` is "ML", "WL", or "MAP".
`weights`	A two-column matrix or data frame containing the quadrature points (in the first column) and the corresponding weights (in the second column) of the latent variable prior distribution. The weights and quadrature points can be easily obtained using the function `gen.weight`. If NULL and `method` is "EAP", default values are used (see the arguments of `norm.prior` and `nquad`). Ignored if `method` is "ML", "WL", or "MAP".
`ncore`	The number of logical CPU cores to use. Default is 1. See `est_score` for details.
`verbose`	A logical value. If TRUE, the progress messages of purification procedure are suppressed. Default is TRUE.
`...`	Additional arguments that will be forwarded to the `est_score` function.

Details

In CATSIB procedure (Nandakumar & Roussos, 2004), because \hat{\beta}^{\ast}, which is the expected \theta regressed on \hat{\beta}, is a continuous variable, the range of \hat{\beta}^{\ast} is divided into K equal intervals and examinees are classified into one of K intervals on the basis of their \hat{\beta}^{\ast}.Then, any intervals that contain less than three examinees in either reference or focal groups were excluded from the computation of \hat{\beta}, which is a measure of the amount of DIF, to ensure stable statistical estimation. According to Nandakumar and Roussos (2004), a default minimum size of each bin is set to 3 in min.binsize.

To carefully choose the number of intervals (K), the catsib automatically determines it by gradually decreasing K from a larger to smaller numbers based the rule used in Nandakumar and Roussos (2004). Specifically, beginning with an arbitrary large number (e.g., 80), if more than a certain permissible percentage, let's say 7.5%, of examinees in either the reference or focal groups were removed, the catsib automatically decreases the number of bins by one unit until a total number of examinees in each group reaches to more than or equal to 92.5%. However, Nandakumar and Roussos (2004) recommended setting the minimum K to 10 to avoid a situation that extremely a few intervals are left, even if the number of remaining examinees in each group is less than 92.5%. Thus, the maximum and minimum number of bins are set to 80 and 10, respectively, as default in n.bin. Also, a default maximum permissible proportion of examinees to be deleted from either reference group or focal group is set to 0.075 in max.del.

When it comes to the target ability distribution used to compute \hat{\beta}, Li and Stout (1996) and Nandakumar and Roussos (2004) used the combined-group target ability distribution, which is a default option in weight.group. See Nandakumar and Roussos (2004) for more detail about the CATSIB method.

Although Nandakumar and Roussos (2004) did not propose a purification procedure for DIF analysis using CATSIB, the catsib can implement an iterative purification process in a similar way as in Lim, Choe, and Han (2022). Simply, at each iterative purification, examinees' latent abilities are computed using purified items and scoring method specified in the method argument. The iterative purification process stops when no further DIF items are found or the process reaches a predetermined limit of iteration, which can be specified in the max.iter argument. See Lim et al. (2022) for more details about the purification procedure.

Scoring with a limited number of items can result in large standard errors, which may impact the effectiveness of DIF detection within the CATSIB procedure. The min.resp argument can be employed to avoid using scores with significant standard errors when calculating the CATSIB statistic, particularly during the purification process. For instance, if min.resp is not NULL (e.g., min.resp=5), item responses from examinees whose total item responses fall below the specified minimum number are treated as missing values (i.e., NA). Consequently, their ability estimates become missing values and are not utilized in computing the CATSIB statistic. If min.resp=NULL, an examinee's score will be computed as long as there is at least one item response for the examinee.

Value

This function returns a list of four internal objects. The four objects are:

`no_purify`	A list of several sub-objects containing the results of DIF analysis without a purification procedure. The sub-objects are: dif_stat A data frame containing the results of CATSIB statistics across all evaluated items. From the first column, each column indicates item's ID, CATSIB (beta) statistic, standard error of the beta, standardized beta, p-value of the beta, sample size of the reference group, sample size of the focal group, and total sample size, respectively. dif_item A numeric vector showing potential DIF items flagged by CATSIB statistic. contingency A contingency table of each item used to compute CATSIB statistic.
`purify`	A logical value indicating whether the purification process was used.
`with_purify`	A list of several sub-objects containing the results of DIF analysis with a purification procedure. The sub-objects are: dif_stat A data frame containing the results of CATSIB statistics across all evaluated items. From the first column, each column indicates item's ID, CATSIB (beta) statistic, standard error of the beta, standardized beta, p-value of the beta, sample size of the reference group, sample size of the focal group, and total sample size, and nth iteration where the CATSIB statistic was computed, respectively. dif_item A numeric vector showing potential DIF items flagged by CATSIB statistic. n.iter A total number of iterations implemented for the purification. complete A logical value indicating whether the purification process was completed. If FALSE, it means that the purification process reached the maximum iteration number but it was not complete. contingency A contingency table of each item used to compute CATSIB statistic.
`alpha`	A significance `\alpha`-level used to compute the p-values of RDIF statistics.

Author(s)

Hwanggyu Lim hglim83@gmail.com

References

Li, H. H., & Stout, W. (1996). A new procedure for detection of crossing DIF. Psychometrika, 61(4), 647-677.

Lim, H., Choe, E. M., & Han, K. T. (2022). A residual-based differential item functioning detection framework in item response theory. Journal of Educational Measurement.

Nandakumar, R., & Roussos, L. (2004). Evaluation of the CATSIB DIF procedure in a pretest setting. Journal of Educational and Behavioral Statistics, 29(2), 177-199.

Shealy, R. T., & Stout, W. F. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DIF as well as item bias/DIF. Psychometrika, 58, 159–194.

Examples


# call library
library("dplyr")

## Uniform DIF detection
###############################################
# (1) manipulate true uniform DIF data
###############################################
# import the "-prm.txt" output file from flexMIRT
flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ")

# select 36 of 3PLM items which are non-DIF items
par_nstd <-
  bring.flexmirt(file=flex_sam, "par")$Group1$full_df %>%
  dplyr::filter(.data$model == "3PLM") %>%
  dplyr::filter(dplyr::row_number() %in% 1:36) %>%
  dplyr::select(1:6)
par_nstd$id <- paste0("nondif", 1:36)

# generate four new items to inject uniform DIF
difpar_ref <-
  shape_df(par.drm=list(a=c(0.8, 1.5, 0.8, 1.5), b=c(0.0, 0.0, -0.5, -0.5), g=0.15),
           item.id=paste0("dif", 1:4), cats=2, model="3PLM")

# manipulate uniform DIF on the four new items by adding constants to b-parameters
# for the focal group
difpar_foc <-
  difpar_ref %>%
  dplyr::mutate_at(.vars="par.2", .funs=function(x) x + rep(0.7, 4))

# combine the 4 DIF and 36 non-DIF items for both reference and focal groups
# thus, the first four items have uniform DIF
par_ref <- rbind(difpar_ref, par_nstd)
par_foc <- rbind(difpar_foc, par_nstd)

# generate the true thetas
set.seed(123)
theta_ref <- rnorm(500, 0.0, 1.0)
theta_foc <- rnorm(500, 0.0, 1.0)

# generate the response data
resp_ref <- simdat(par_ref, theta=theta_ref, D=1)
resp_foc <- simdat(par_foc, theta=theta_foc, D=1)
data <- rbind(resp_ref, resp_foc)

###############################################
# (2) estimate the item and ability parameters
#     using the aggregate data
###############################################
# estimate the item parameters
est_mod <- est_irt(data=data, D=1, model="3PLM")
est_par <- est_mod$par.est

# estimate the ability parameters using ML
theta_est <- est_score(x=est_par, data=data, method="ML")
score <- theta_est$est.theta
se <- theta_est$se.theta

###############################################
# (3) conduct DIF analysis
###############################################
# create a vector of group membership indicators
# where '1' indicates the focal group
group <- c(rep(0, 500), rep(1, 500))

# (a)-1 compute SIBTEST statistic by providing scores,
#       and without a purification
dif_1 <- catsib(x=NULL, data=data, D=1, score=score, se=se, group=group, focal.name=1,
 weight.group="comb", alpha=0.05, missing=NA, purify=FALSE)
print(dif_1)

# (a)-2 compute SIBTEST statistic by providing scores,
#       and with a purification
dif_2 <- catsib(x=est_par, data=data, D=1, score=score, se=se, group=group, focal.name=1,
 weight.group="comb", alpha=0.05, missing=NA, purify=TRUE)
print(dif_2)

[Package irtQ version 0.2.0 Index]