R: Traditional IRT item fit statistics

irtfit {irtQ}

R Documentation

Traditional IRT item fit statistics

Description

This function computes traditional IRT item fit statistics (i.e., \chi^{2} fit statistic (e.g., Bock, 1960; Yen, 1981), loglikelihood ratio \chi^{2} fit statistic (G^{2}; McKinley & Mills, 1985), and infit and outfit statistics (Ames et al., 2015)) and returns contingency tables to compute the \chi^{2} and G^{2} fit statistics. Note that caution is needed in interpreting the infit and outfit statistics for non-Rasch models. The saved object of this function, especially the object of contingency tables, is used in the function of plot.irtfit to draw a raw and standardized residual plots (Hambleton et al., 1991).

Usage

irtfit(x, ...)

## Default S3 method:
irtfit(
  x,
  score,
  data,
  group.method = c("equal.width", "equal.freq"),
  n.width = 10,
  loc.theta = "average",
  range.score = NULL,
  D = 1,
  alpha = 0.05,
  missing = NA,
  overSR = 2,
  min.collapse = 1,
  pcm.loc = NULL,
  ...
)

## S3 method for class 'est_item'
irtfit(
  x,
  group.method = c("equal.width", "equal.freq"),
  n.width = 10,
  loc.theta = "average",
  range.score = NULL,
  alpha = 0.05,
  missing = NA,
  overSR = 2,
  min.collapse = 1,
  pcm.loc = NULL,
  ...
)

## S3 method for class 'est_irt'
irtfit(
  x,
  score,
  group.method = c("equal.width", "equal.freq"),
  n.width = 10,
  loc.theta = "average",
  range.score = NULL,
  alpha = 0.05,
  missing = NA,
  overSR = 2,
  min.collapse = 1,
  pcm.loc = NULL,
  ...
)

Arguments

`x`	A data frame containing the item metadata (e.g., item parameters, number of categories, models ...), an object of class `est_item` obtained from the function `est_item`, or an object of class `est_irt` obtained from the function `est_irt`. The data frame of item metadata can be easily obtained using the function `shape_df`. See below for more detail.
`...`	Further arguments passed to or from other methods.
`score`	A vector of examinees' ability estimates.
`data`	A matrix containing examinees' response data for the items in the argument `x`. A row and column indicate the examinees and items, respectively.
`group.method`	A character string indicating how to group examinees along the ability scale for computing the `\chi^{2}` and `G^{2}` fit statistics. Available methods are "equal.width" for grouping examinees by dividing the ability scale into intervals of equal width and "equal.freq" for grouping examinees by dividing the ability scale into intervals with equal frequencies of examinees. However, "equal.freq" does not always guarantee exactly the same frequency of examinees for all groups. Default is "equal.width". To divide the ability scale, the range of ability scale and the number of divided groups must be specified in the arguments of `range.score` and `n.width`, respectively. See below for details.
`n.width`	An integer value to specify the number of divided groups along the ability scale. Default is 10. See below for more detail.
`loc.theta`	A character string to indicate the location of ability point at each group (or interval) where the expected probabilities of score categories are calculated using the IRT models. Available locations are "average" for computing the expected probability at the average point of examinees' ability estimates in each group and "middle" for computing the expected probability at the midpoint of each group. Default is "average".
`range.score`	A vector of two numeric values to restrict the range of ability scale. All ability estimates less than the first value are transformed to the first value. All ability estimates greater than the second value are transformed to the second value. If NULL, the minimum and maximum values of ability estimates in the argument `score` is used as the range of ability scale. Note that selection of grouping method in the argument `group.method` has nothing to do with the range of ability scale. Default is NULL.
`D`	A scaling factor in IRT models to make the logistic function as close as possible to the normal ogive function (if set to 1.7). Default is 1.
`alpha`	A numeric value to specify significance `\alpha`-level of the hypothesis test for the `\chi^{2}` and `G^{2}` fit statistics. Default is .05.
`missing`	A value indicating missing values in the response data set. Default is NA.
`overSR`	A numeric value to specify a criterion to find ability groups (or intervals) which have standardized residuals greater than the specified value. Default is 2.
`min.collapse`	An integer value to indicate the minimum frequency of cells to be collapsed when computing the `\chi^{2}` and `G^{2}` fit statistics. Neighboring interval groups will be collapsed to avoid expected interval frequencies less than the specified minimum cell frequency. Default is 1.
`pcm.loc`	A vector of integer values indicating the locations of partial credit model (PCM) items whose slope parameters are fixed

Details

A specific form of a data frame should be used for the argument x. The first column should have item IDs, the second column should contain unique score category numbers of the items, and the third column should include IRT models being fit to the items. The available IRT models are "1PLM", "2PLM", "3PLM", and "DRM" for dichotomous item data, and "GRM" and "GPCM" for polytomous item data. Note that "DRM" covers all dichotomous IRT models (i.e, "1PLM", "2PLM", and "3PLM") and "GRM" and "GPCM" represent the graded response model and (generalized) partial credit model, respectively. The next columns should include the item parameters of the fitted IRT models. For dichotomous items, the fourth, fifth, and sixth columns represent the item discrimination (or slope), item difficulty, and item guessing parameters, respectively. When "1PLM" and "2PLM" are specified in the third column, NAs should be inserted in the sixth column for the item guessing parameters. For polytomous items, the item discrimination (or slope) parameters should be included in the fourth column and the item difficulty (or threshold) parameters of category boundaries should be contained from the fifth to the last columns. When the number of unique score categories differs between items, the empty cells of item parameters should be filled with NAs. In the irtQ package, the item difficulty (or threshold) parameters of category boundaries for GPCM are expressed as the item location (or overall difficulty) parameter subtracted by the threshold parameter for unique score categories of the item. Note that when an GPCM item has K unique score categories, K-1 item difficulty parameters are necessary because the item difficulty parameter for the first category boundary is always 0. For example, if an GPCM item has five score categories, four item difficulty parameters should be specified. An example of a data frame with a single-format test is as follows:

ITEM1	2	1PLM	1.000	1.461	NA
ITEM2	2	2PLM	1.921	-1.049	NA
ITEM3	2	3PLM	1.736	1.501	0.203
ITEM4	2	3PLM	0.835	-1.049	0.182
ITEM5	2	DRM	0.926	0.394	0.099

And an example of a data frame for a mixed-format test is as follows:

ITEM1	2	1PLM	1.000	1.461	NA	NA	NA
ITEM2	2	2PLM	1.921	-1.049	NA	NA	NA
ITEM3	2	3PLM	0.926	0.394	0.099	NA	NA
ITEM4	2	DRM	1.052	-0.407	0.201	NA	NA
ITEM5	4	GRM	1.913	-1.869	-1.238	-0.714	NA
ITEM6	5	GRM	1.278	-0.724	-0.068	0.568	1.072
ITEM7	4	GPCM	1.137	-0.374	0.215	0.848	NA
ITEM8	5	GPCM	1.233	-2.078	-1.347	-0.705	-0.116

See IRT Models section in the page of irtQ-package for more detail about the IRT models used in the irtQ package. An easier way to create a data frame for the argument x is by using the function shape_df.

To calculate the \chi^{2} and G^{2} fit statistics, two methods are used in the argument group.method to divide the ability scale into several groups. If group.method = "equal.width", the examinees are grouped based on equal length of intervals. If group.method = "equal.freq", the examinees are grouped so that all groups have equal frequencies. However, the grouping method of "equal.freq" does guarantee that every group has the exactly same frequency of examinees. This is because the examinees are divided by the same size of quantile.

When dividing the ability scale into intervals to compute the \chi^{2} and G^{2} fit statistics, the intervals should be wide enough not to include too small number of examinees. On the other hand, the interval should be narrow enough to include homogeneous examinees in terms of ability (Hambleton et al, 1991). Thus, if you want to divide the ability scale into other than ten groups, you need to specify the number of groups in the argument n.width. Yen (1981) fixed the number of groups to 10, whereas Bock (1960) allowed for any number of groups.

Regarding degrees of freedom (df), the \chi^{2} is assumed to be distributed approximately as a chi-square with df equal to the number of groups less the number of the IRT model parameters (Ames et al., 2015) whereas the G^{2} is assumed to be distributed approximately as a chi-square with df equal to the number of groups (Ames et al., 2015; Muraki & Bock, 2003)

Note that if "DRM" is specified for an item in the item metadata set, the item is considered as "3PLM" to compute degrees of freedom of the \chi^{2} fit statistic.

Value

This function returns an object of class irtfit. Within this object, several internal objects are contained such as:

`fit_stat`	A data frame containing the results of three IRT fit statistics (i.e., `\chi^{2}` and `G^{2}`, infit, outfit statistics) across all evaluated items. In the data frame, the columns indicate item's ID, `\chi^{2}` fit statistic, `G^{2}` fit statistic, degrees of freedom for the `\chi^{2}`, degrees of freedom for the `G^{2}`, critical value for the `\chi^{2}`, critical value for the `G^{2}`, p-value for the `\chi^{2}`, p-value for the `G^{2}`, outfit statistic, infit statistic, the number of examinees used to compute the five fit statistics, and the proportion of ability groups (or intervals), before collapsing the cells, that have standardized residuals greater than the specified criterion in the argument `overSR`, respectively.
`contingency.fitstat`	A list of contingency tables used to compute the `\chi^{2}` and `G^{2}` fit statistics for all items. Note that the collapsing cell strategy is implemented to these contingency tables.
`contingency.plot`	A list of contingency tables used to draw a raw and standardized residual plots (Hambleton et al., 1991) in the function of `plot.irtfit`. Note that the collapsing cell strategy is not implemented to these contingency tables.
`individual.info`	A list of data frames including individual residual and variance values. Those information are used to compute infit and outfit statistics.
`item_df`	The item metadata specified in the argument `x`.
`ancillary`	A list of ancillary information used in the item fit analysis.

Methods (by class)

default: Default method to compute the traditional IRT item fit statistics for a data frame x containing the item metadata.
est_item: An object created by the function est_item.
est_irt: An object created by the function est_irt.

Author(s)

Hwanggyu Lim hglim83@gmail.com

References

Ames, A. J., & Penfield, R. D. (2015). An NCME Instructional Module on Item-Fit Statistics for Item Response Theory Models. Educational Measurement: Issues and Practice, 34(3), 39-48.

Bock, R.D. (1960), Methods and applications of optimal scaling. Chapel Hill, NC: L.L. Thurstone Psychometric Laboratory.

Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991).Fundamentals of item response theory. Newbury Park, CA: Sage.

McKinley, R., & Mills, C. (1985). A comparison of several goodness-of-fit statistics. Applied Psychological Measurement, 9, 49-57.

Muraki, E. & Bock, R. D. (2003). PARSCALE 4: IRT item analysis and test scoring for rating scale data [Computer Program]. Chicago, IL: Scientific Software International. URL http://www.ssicentral.com

Wells, C. S., & Bolt, D. M. (2008). Investigation of a nonparametric procedure for assessing goodness-of-fit in item response theory. Applied Measurement in Education, 21(1), 22-40.

Yen, W. M. (1981). Using simulation results to choose a latent trait model. Applied Psychological Measurement, 5, 245-262.

Examples


## example 1
## use the simulated CAT data
# find the location of items that have more than 10,000 responses
over10000 <- which(colSums(simCAT_MX$res.dat, na.rm=TRUE) > 10000)

# select the items that have more than 10,000 responses
x <- simCAT_MX$item.prm[over10000, ]

# select the response data for the items
data <- simCAT_MX$res.dat[, over10000]

# select the examinees' abilities
score <- simCAT_MX$score

# compute fit statistics
fit1 <- irtfit(x=x, score=score, data=data, group.method="equal.width",
               n.width=10, loc.theta="average", range.score=NULL, D=1, alpha=0.05,
               missing=NA, overSR=2)

# fit statistics
fit1$fit_stat

# contingency tables
fit1$contingency.fitstat


## example 2
## import the "-prm.txt" output file from flexMIRT
flex_sam <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ")

# select the first two dichotomous items and last polytomous item
x <- bring.flexmirt(file=flex_sam, "par")$Group1$full_df[c(1:2, 55), ]

# generate examinees' abilities from N(0, 1)
set.seed(10)
score <- rnorm(1000, mean=0, sd=1)

# simulate the response data
data <- simdat(x=x, theta=score, D=1)

# compute fit statistics
fit2 <- irtfit(x=x, score=score, data=data, group.method="equal.freq",
               n.width=11, loc.theta="average", range.score=c(-4, 4), D=1, alpha=0.05)

# fit statistics
fit2$fit_stat

# contingency tables
fit2$contingency.fitstat

# residual plots for the first item (dichotomous item)
plot(x=fit2, item.loc=1, type = "both", ci.method = "wald", show.table=TRUE, ylim.sr.adjust=TRUE)

# residual plots for the third item (polytomous item)
plot(x=fit2, item.loc=3, type = "both", ci.method = "wald", show.table=FALSE, ylim.sr.adjust=TRUE)

[Package irtQ version 0.2.0 Index]