catsib {irtQ} | R Documentation |
CATSIB DIF detection procedure
Description
This function analyzes DIF on an item using CATSIB procedure (Nandakumar & Roussos, 2004), which is a modified version of SIBTEST (Shealy & Stout, 1993). The CATSIB procedure can be applied to a computerized adaptive testing (CAT) environment for differential item functioning (DIF) detection. In CATSIB, examinees are matched on IRT-based ability estimates adjusted by employing a regression correction method (Shealy & Stout, 1993) to reduce a statistical bias of the CATSIB statistic due to impact.
Usage
catsib(
x = NULL,
data,
score = NULL,
se = NULL,
group,
focal.name,
D = 1,
n.bin = c(80, 10),
min.binsize = 3,
max.del = 0.075,
weight.group = c("comb", "foc", "ref"),
alpha = 0.05,
missing = NA,
purify = FALSE,
max.iter = 10,
min.resp = NULL,
method = "ML",
range = c(-5, 5),
norm.prior = c(0, 1),
nquad = 41,
weights = NULL,
ncore = 1,
verbose = TRUE,
...
)
Arguments
x |
A data frame containing the item metadata (e.g., item parameters, number of categories, models ...).
|
data |
A matrix containing examinees' response data of the items in the argument |
score |
A vector of examinees' ability estimates. If the abilities are not provided (i.e., |
se |
A vector of the standard errors of the ability estimates. The standard errors should be ordered in accordance with the order of
the ability estimates specified in the |
group |
A numeric or character vector indicating group membership of examinees. The length of vector should be the same with the number of rows in the response data matrix. |
focal.name |
A single numeric or character scalar representing the level associated with the focal group. For instance,
given |
D |
A scaling factor in IRT models to make the logistic function as close as possible to the normal ogive function (if set to 1.7). Default is 1. |
n.bin |
A vector of two positive integers to set the maximum and minimum numbers of bins (or intervals) on the ability scale. The first and second values indicate the maximum and minimum numbers of the bins, respectively. See below for more detail. |
min.binsize |
A positive integer value to set the minimum size of each bin. To ensure stable statistical estimation, each bin is required
to have a certain number of examinees (e.g, 3), at least, from both reference and focal groups if it was to be included in calculation of |
max.del |
A numerical value to set the maximum permissible proportion of examinees to be deleted from either reference group or focal group when automatically determining the number of bins on the ability scale. Default is 0.075. See below for more detail. |
weight.group |
A single character string to specify a target ability distribution over which the expectation of DIF measure, called |
alpha |
A numeric value to specify significance |
missing |
A value indicating missing values in the response data set. Default is NA. |
purify |
A logical value indicating whether a purification process will be implemented or not. Default is FALSE. See below for more detail. |
max.iter |
A positive integer value specifying the maximum number of iterations for the purification process. Default is 10. |
min.resp |
A positive integer value specifying the minimum number of item responses for an examinee required to compute the ability estimate. Default is NULL. See details below for more information. |
method |
A character string indicating a scoring method. Available methods are "ML" for the maximum likelihood estimation, "WL" for the weighted likelihood estimation, "MAP" for the maximum a posteriori estimation, and "EAP" for the expected a posteriori estimation. Default method is "ML". |
range |
A numeric vector of two components to restrict the range of ability scale for the ML, WL, MLF, and MAP scoring methods. Default is c(-5, 5). |
norm.prior |
A numeric vector of two components specifying a mean and standard deviation of the normal prior distribution.
These two parameters are used to obtain the gaussian quadrature points and the corresponding weights from the normal distribution. Default is
c(0,1). Ignored if |
nquad |
An integer value specifying the number of gaussian quadrature points from the normal prior distribution. Default is 41.
Ignored if |
weights |
A two-column matrix or data frame containing the quadrature points (in the first column) and the corresponding weights
(in the second column) of the latent variable prior distribution. The weights and quadrature points can be easily obtained
using the function |
ncore |
The number of logical CPU cores to use. Default is 1. See |
verbose |
A logical value. If TRUE, the progress messages of purification procedure are suppressed. Default is TRUE. |
... |
Additional arguments that will be forwarded to the |
Details
In CATSIB procedure (Nandakumar & Roussos, 2004), because \hat{\beta}^{\ast}
, which is the expected \theta
regressed on \hat{\beta}
,
is a continuous variable, the range of \hat{\beta}^{\ast}
is divided into K equal intervals and examinees are classified into one of K intervals
on the basis of their \hat{\beta}^{\ast}
.Then, any intervals that contain less than three examinees in either reference or focal groups were
excluded from the computation of \hat{\beta}
, which is a measure of the amount of DIF, to ensure stable statistical estimation. According to
Nandakumar and Roussos (2004), a default minimum size of each bin is set to 3 in min.binsize
.
To carefully choose the number of intervals (K), the catsib
automatically determines it by gradually decreasing K from a larger to
smaller numbers based the rule used in Nandakumar and Roussos (2004). Specifically, beginning with an arbitrary large number (e.g., 80),
if more than a certain permissible percentage, let's say 7.5%, of examinees in either the reference or focal groups were removed, the catsib
automatically decreases the number of bins by one unit until a total number of examinees in each group reaches to more than or equal to 92.5%.
However, Nandakumar and Roussos (2004) recommended setting the minimum K to 10 to avoid a situation that extremely a few intervals are left,
even if the number of remaining examinees in each group is less than 92.5%. Thus, the maximum and minimum number of bins are set to 80 and 10, respectively,
as default in n.bin
. Also, a default maximum permissible proportion of examinees to be deleted from either reference group or focal group is
set to 0.075 in max.del
.
When it comes to the target ability distribution used to compute \hat{\beta}
, Li and Stout (1996) and Nandakumar and Roussos (2004) used the combined-group
target ability distribution, which is a default option in weight.group
. See Nandakumar and Roussos (2004) for more detail about the CATSIB method.
Although Nandakumar and Roussos (2004) did not propose a purification procedure for DIF analysis using CATSIB, the catsib
can implement an iterative
purification process in a similar way as in Lim, Choe, and Han (2022). Simply, at each iterative purification, examinees' latent abilities are computed using
purified items and scoring method specified in the method
argument. The iterative purification process stops when no further DIF items are found or
the process reaches a predetermined limit of iteration, which can be specified in the max.iter
argument. See Lim et al. (2022)
for more details about the purification procedure.
Scoring with a limited number of items can result in large standard errors, which may impact the effectiveness of DIF detection within
the CATSIB procedure. The min.resp
argument can be employed to avoid using scores with significant standard errors when calculating
the CATSIB statistic, particularly during the purification process. For instance, if min.resp
is not NULL (e.g., min.resp=5
),
item responses from examinees whose total item responses fall below the specified minimum number are treated as missing values (i.e., NA).
Consequently, their ability estimates become missing values and are not utilized in computing the CATSIB statistic. If min.resp=NULL
,
an examinee's score will be computed as long as there is at least one item response for the examinee.
Value
This function returns a list of four internal objects. The four objects are:
no_purify |
A list of several sub-objects containing the results of DIF analysis without a purification procedure. The sub-objects are:
|
purify |
A logical value indicating whether the purification process was used. |
with_purify |
A list of several sub-objects containing the results of DIF analysis with a purification procedure. The sub-objects are:
|
alpha |
A significance |
Author(s)
Hwanggyu Lim hglim83@gmail.com
References
Li, H. H., & Stout, W. (1996). A new procedure for detection of crossing DIF. Psychometrika, 61(4), 647-677.
Lim, H., Choe, E. M., & Han, K. T. (2022). A residual-based differential item functioning detection framework in item response theory. Journal of Educational Measurement.
Nandakumar, R., & Roussos, L. (2004). Evaluation of the CATSIB DIF procedure in a pretest setting. Journal of Educational and Behavioral Statistics, 29(2), 177-199.
Shealy, R. T., & Stout, W. F. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DIF as well as item bias/DIF. Psychometrika, 58, 159–194.
See Also
rdif
, est_item
, info
, simdat
,
shape_df
, gen.weight
, est_score
Examples
# call library
library("dplyr")
## Uniform DIF detection
###############################################
# (1) manipulate true uniform DIF data
###############################################
# import the "-prm.txt" output file from flexMIRT
flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ")
# select 36 of 3PLM items which are non-DIF items
par_nstd <-
bring.flexmirt(file=flex_sam, "par")$Group1$full_df %>%
dplyr::filter(.data$model == "3PLM") %>%
dplyr::filter(dplyr::row_number() %in% 1:36) %>%
dplyr::select(1:6)
par_nstd$id <- paste0("nondif", 1:36)
# generate four new items to inject uniform DIF
difpar_ref <-
shape_df(par.drm=list(a=c(0.8, 1.5, 0.8, 1.5), b=c(0.0, 0.0, -0.5, -0.5), g=0.15),
item.id=paste0("dif", 1:4), cats=2, model="3PLM")
# manipulate uniform DIF on the four new items by adding constants to b-parameters
# for the focal group
difpar_foc <-
difpar_ref %>%
dplyr::mutate_at(.vars="par.2", .funs=function(x) x + rep(0.7, 4))
# combine the 4 DIF and 36 non-DIF items for both reference and focal groups
# thus, the first four items have uniform DIF
par_ref <- rbind(difpar_ref, par_nstd)
par_foc <- rbind(difpar_foc, par_nstd)
# generate the true thetas
set.seed(123)
theta_ref <- rnorm(500, 0.0, 1.0)
theta_foc <- rnorm(500, 0.0, 1.0)
# generate the response data
resp_ref <- simdat(par_ref, theta=theta_ref, D=1)
resp_foc <- simdat(par_foc, theta=theta_foc, D=1)
data <- rbind(resp_ref, resp_foc)
###############################################
# (2) estimate the item and ability parameters
# using the aggregate data
###############################################
# estimate the item parameters
est_mod <- est_irt(data=data, D=1, model="3PLM")
est_par <- est_mod$par.est
# estimate the ability parameters using ML
theta_est <- est_score(x=est_par, data=data, method="ML")
score <- theta_est$est.theta
se <- theta_est$se.theta
###############################################
# (3) conduct DIF analysis
###############################################
# create a vector of group membership indicators
# where '1' indicates the focal group
group <- c(rep(0, 500), rep(1, 500))
# (a)-1 compute SIBTEST statistic by providing scores,
# and without a purification
dif_1 <- catsib(x=NULL, data=data, D=1, score=score, se=se, group=group, focal.name=1,
weight.group="comb", alpha=0.05, missing=NA, purify=FALSE)
print(dif_1)
# (a)-2 compute SIBTEST statistic by providing scores,
# and with a purification
dif_2 <- catsib(x=est_par, data=data, D=1, score=score, se=se, group=group, focal.name=1,
weight.group="comb", alpha=0.05, missing=NA, purify=TRUE)
print(dif_2)