irtfit {irtQ} | R Documentation |
Traditional IRT item fit statistics
Description
This function computes traditional IRT item fit statistics (i.e., \chi^{2}
fit statistic (e.g., Bock, 1960; Yen, 1981),
loglikelihood ratio \chi^{2}
fit statistic (G^{2}
; McKinley & Mills, 1985), and infit and outfit statistics (Ames et al., 2015)) and returns
contingency tables to compute the \chi^{2}
and G^{2}
fit statistics. Note that caution is needed in interpreting the infit and
outfit statistics for non-Rasch models. The saved object of this function, especially the object of contingency tables,
is used in the function of plot.irtfit
to draw a raw and standardized residual plots (Hambleton et al., 1991).
Usage
irtfit(x, ...)
## Default S3 method:
irtfit(
x,
score,
data,
group.method = c("equal.width", "equal.freq"),
n.width = 10,
loc.theta = "average",
range.score = NULL,
D = 1,
alpha = 0.05,
missing = NA,
overSR = 2,
min.collapse = 1,
pcm.loc = NULL,
...
)
## S3 method for class 'est_item'
irtfit(
x,
group.method = c("equal.width", "equal.freq"),
n.width = 10,
loc.theta = "average",
range.score = NULL,
alpha = 0.05,
missing = NA,
overSR = 2,
min.collapse = 1,
pcm.loc = NULL,
...
)
## S3 method for class 'est_irt'
irtfit(
x,
score,
group.method = c("equal.width", "equal.freq"),
n.width = 10,
loc.theta = "average",
range.score = NULL,
alpha = 0.05,
missing = NA,
overSR = 2,
min.collapse = 1,
pcm.loc = NULL,
...
)
Arguments
x |
A data frame containing the item metadata (e.g., item parameters, number of categories, models ...), an object of class |
... |
Further arguments passed to or from other methods. |
score |
A vector of examinees' ability estimates. |
data |
A matrix containing examinees' response data for the items in the argument |
group.method |
A character string indicating how to group examinees along the ability scale for computing the |
n.width |
An integer value to specify the number of divided groups along the ability scale. Default is 10. See below for more detail. |
loc.theta |
A character string to indicate the location of ability point at each group (or interval) where the expected probabilities of score categories are calculated using the IRT models. Available locations are "average" for computing the expected probability at the average point of examinees' ability estimates in each group and "middle" for computing the expected probability at the midpoint of each group. Default is "average". |
range.score |
A vector of two numeric values to restrict the range of ability scale. All ability estimates less than
the first value are transformed to the first value. All ability estimates greater than the second value are transformed to the second value.
If NULL, the minimum and maximum values of ability estimates in the argument |
D |
A scaling factor in IRT models to make the logistic function as close as possible to the normal ogive function (if set to 1.7). Default is 1. |
alpha |
A numeric value to specify significance |
missing |
A value indicating missing values in the response data set. Default is NA. |
overSR |
A numeric value to specify a criterion to find ability groups (or intervals) which have standardized residuals greater than the specified value. Default is 2. |
min.collapse |
An integer value to indicate the minimum frequency of cells to be collapsed when computing the |
pcm.loc |
A vector of integer values indicating the locations of partial credit model (PCM) items whose slope parameters are fixed |
Details
A specific form of a data frame should be used for the argument x
. The first column should have item IDs,
the second column should contain unique score category numbers of the items, and the third column should include IRT models being fit to the items.
The available IRT models are "1PLM", "2PLM", "3PLM", and "DRM" for dichotomous item data, and "GRM" and "GPCM" for polytomous item data.
Note that "DRM" covers all dichotomous IRT models (i.e, "1PLM", "2PLM", and "3PLM") and "GRM" and "GPCM" represent the graded
response model and (generalized) partial credit model, respectively. The next columns should include the item parameters of the fitted IRT models.
For dichotomous items, the fourth, fifth, and sixth columns represent the item discrimination (or slope), item difficulty, and
item guessing parameters, respectively. When "1PLM" and "2PLM" are specified in the third column, NAs should be inserted in the sixth column
for the item guessing parameters. For polytomous items, the item discrimination (or slope) parameters should be included in the
fourth column and the item difficulty (or threshold) parameters of category boundaries should be contained from the fifth to the last columns.
When the number of unique score categories differs between items, the empty cells of item parameters should be filled with NAs.
In the irtQ package, the item difficulty (or threshold) parameters of category boundaries for GPCM are expressed as
the item location (or overall difficulty) parameter subtracted by the threshold parameter for unique score categories of the item.
Note that when an GPCM item has K unique score categories, K-1 item difficulty parameters are necessary because
the item difficulty parameter for the first category boundary is always 0. For example, if an GPCM item has five score categories,
four item difficulty parameters should be specified. An example of a data frame with a single-format test is as follows:
ITEM1 | 2 | 1PLM | 1.000 | 1.461 | NA |
ITEM2 | 2 | 2PLM | 1.921 | -1.049 | NA |
ITEM3 | 2 | 3PLM | 1.736 | 1.501 | 0.203 |
ITEM4 | 2 | 3PLM | 0.835 | -1.049 | 0.182 |
ITEM5 | 2 | DRM | 0.926 | 0.394 | 0.099 |
And an example of a data frame for a mixed-format test is as follows:
ITEM1 | 2 | 1PLM | 1.000 | 1.461 | NA | NA | NA |
ITEM2 | 2 | 2PLM | 1.921 | -1.049 | NA | NA | NA |
ITEM3 | 2 | 3PLM | 0.926 | 0.394 | 0.099 | NA | NA |
ITEM4 | 2 | DRM | 1.052 | -0.407 | 0.201 | NA | NA |
ITEM5 | 4 | GRM | 1.913 | -1.869 | -1.238 | -0.714 | NA |
ITEM6 | 5 | GRM | 1.278 | -0.724 | -0.068 | 0.568 | 1.072 |
ITEM7 | 4 | GPCM | 1.137 | -0.374 | 0.215 | 0.848 | NA |
ITEM8 | 5 | GPCM | 1.233 | -2.078 | -1.347 | -0.705 | -0.116 |
See IRT Models
section in the page of irtQ-package
for more detail about the IRT models used in the irtQ package.
An easier way to create a data frame for the argument x
is by using the function shape_df
.
To calculate the \chi^{2}
and G^{2}
fit statistics, two methods are used in the argument group.method
to divide the ability scale
into several groups. If group.method = "equal.width"
, the examinees are grouped based on equal length of intervals.
If group.method = "equal.freq"
, the examinees are grouped so that all groups have equal frequencies. However, the grouping method
of "equal.freq" does guarantee that every group has the exactly same frequency of examinees. This is because the examinees are divided by
the same size of quantile.
When dividing the ability scale into intervals to compute the \chi^{2}
and G^{2}
fit statistics, the intervals should be wide enough not to include
too small number of examinees. On the other hand, the interval should be narrow enough to include homogeneous examinees in terms of ability
(Hambleton et al, 1991). Thus, if you want to divide the ability scale into other than ten groups, you need to specify the number of groups
in the argument n.width
. Yen (1981) fixed the number of groups to 10, whereas Bock (1960) allowed for any number of groups.
Regarding degrees of freedom (df), the \chi^{2}
is assumed to be distributed approximately as a chi-square with df equal to
the number of groups less the number of the IRT model parameters (Ames et al., 2015) whereas the G^{2}
is assumed to be distributed approximately
as a chi-square with df equal to the number of groups (Ames et al., 2015; Muraki & Bock, 2003)
Note that if "DRM" is specified for an item in the item metadata set, the item is considered as "3PLM" to compute degrees of freedom of
the \chi^{2}
fit statistic.
Value
This function returns an object of class irtfit
. Within this object, several internal objects are contained such as:
fit_stat |
A data frame containing the results of three IRT fit statistics (i.e., |
contingency.fitstat |
A list of contingency tables used to compute the |
contingency.plot |
A list of contingency tables used to draw a raw and standardized residual plots (Hambleton et al., 1991) in the function of
|
individual.info |
A list of data frames including individual residual and variance values. Those information are used to compute infit and outfit statistics. |
item_df |
The item metadata specified in the argument |
ancillary |
A list of ancillary information used in the item fit analysis. |
Methods (by class)
-
default
: Default method to compute the traditional IRT item fit statistics for a data framex
containing the item metadata. -
est_item
: An object created by the functionest_item
. -
est_irt
: An object created by the functionest_irt
.
Author(s)
Hwanggyu Lim hglim83@gmail.com
References
Ames, A. J., & Penfield, R. D. (2015). An NCME Instructional Module on Item-Fit Statistics for Item Response Theory Models. Educational Measurement: Issues and Practice, 34(3), 39-48.
Bock, R.D. (1960), Methods and applications of optimal scaling. Chapel Hill, NC: L.L. Thurstone Psychometric Laboratory.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991).Fundamentals of item response theory. Newbury Park, CA: Sage.
McKinley, R., & Mills, C. (1985). A comparison of several goodness-of-fit statistics. Applied Psychological Measurement, 9, 49-57.
Muraki, E. & Bock, R. D. (2003). PARSCALE 4: IRT item analysis and test scoring for rating scale data [Computer Program]. Chicago, IL: Scientific Software International. URL http://www.ssicentral.com
Wells, C. S., & Bolt, D. M. (2008). Investigation of a nonparametric procedure for assessing goodness-of-fit in item response theory. Applied Measurement in Education, 21(1), 22-40.
Yen, W. M. (1981). Using simulation results to choose a latent trait model. Applied Psychological Measurement, 5, 245-262.
See Also
plot.irtfit
, shape_df
, est_item
Examples
## example 1
## use the simulated CAT data
# find the location of items that have more than 10,000 responses
over10000 <- which(colSums(simCAT_MX$res.dat, na.rm=TRUE) > 10000)
# select the items that have more than 10,000 responses
x <- simCAT_MX$item.prm[over10000, ]
# select the response data for the items
data <- simCAT_MX$res.dat[, over10000]
# select the examinees' abilities
score <- simCAT_MX$score
# compute fit statistics
fit1 <- irtfit(x=x, score=score, data=data, group.method="equal.width",
n.width=10, loc.theta="average", range.score=NULL, D=1, alpha=0.05,
missing=NA, overSR=2)
# fit statistics
fit1$fit_stat
# contingency tables
fit1$contingency.fitstat
## example 2
## import the "-prm.txt" output file from flexMIRT
flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ")
# select the first two dichotomous items and last polytomous item
x <- bring.flexmirt(file=flex_sam, "par")$Group1$full_df[c(1:2, 55), ]
# generate examinees' abilities from N(0, 1)
set.seed(10)
score <- rnorm(1000, mean=0, sd=1)
# simulate the response data
data <- simdat(x=x, theta=score, D=1)
# compute fit statistics
fit2 <- irtfit(x=x, score=score, data=data, group.method="equal.freq",
n.width=11, loc.theta="average", range.score=c(-4, 4), D=1, alpha=0.05)
# fit statistics
fit2$fit_stat
# contingency tables
fit2$contingency.fitstat
# residual plots for the first item (dichotomous item)
plot(x=fit2, item.loc=1, type = "both", ci.method = "wald", show.table=TRUE, ylim.sr.adjust=TRUE)
# residual plots for the third item (polytomous item)
plot(x=fit2, item.loc=3, type = "both", ci.method = "wald", show.table=FALSE, ylim.sr.adjust=TRUE)