R: Item bank generation (polytomous models)

genPolyMatrix {catR}

R Documentation

Item bank generation (polytomous models)

Description

This command generates an item bank from prespecified parent distributions for use with polytomous IRT models. Subgroups of items can also be specified for content balancing purposes.

Usage

genPolyMatrix(items = 100, nrCat = 3, model = "GRM", seed = 1, same.nrCat = FALSE,
 	cbControl = NULL)

Arguments

`items`	integer: the number of items to generate (default is 100).
`nrCat`	integer: the (maximum) number of response categories to generate (default is 3).
`model`	character: the type of polytomous IRT model. Possible values are `"GRM"` (default), `"MGRM"`, `"PCM"`, `"GPCM"` and `"NRM"`. See Details.
`seed`	numeric: the random seed for item parameter generation (default is 1).
`same.nrCat`	logical: should all items have the same number of response categories? (default is `FALSE`. Ignored if `model` is either `"MGRM"` or `"RSM"`. See Details.
`cbControl`	either a list of accurate format to control for content balancing, or `NULL`. See Details.

Details

The genPolyMatrix permits to quickly generate a polytomous item bank in suitable format for further use in e.g. computing item response probabilities with the Pi.

The six polytomous IRT models that are supported are:

the Graded Response Model (GRM; Samejima, 1969);
the Modified Graded Response Model (MGRM; Muraki, 1990);
the Partial Credit Model (PCM; Masters, 1982);
the Generalized Partial Credit Model (GPCM; Muraki, 1992);
the Rating Scale Model (RSM; Andrich, 1978);
the Nominal Response Model (NRM; Bock, 1972).

Each model is specified through the model argument, with its accronym surrounded by double quotes (i.e. "GRM" for GRM, "PCM" for PCM, etc.). The default value is "GRM".

For any item j, set (0, ..., g_j) as the g_j+1 possible response categories. The maximum number of response categories can differ across items under the GRM, PCM, GPCM and NRM, but they are obviously equal across items under the MGRM and RSM. In the latter, set g as the (same) number of response categories for all items. It is possible however to require all items to have the same number of response categories, by fixing the same.nrCat argument to TRUE.

In case of GRM, PCM, GPCM or NRM with same.nrCat being FALSE, the number of response categories g_j+1 per item is drawn from a Poisson distribution with parameter nrCat, and this number is restricted to the interval [2; nrCat]. This ensure at least two response categories and at most nrCat categories. In all other cases, each g_j+1 is trivially fixed to g+1 = nrCat.

Denote further P_{jk}(\theta) as the probability of answering response category k \in \{0, ..., g_j\} of item j. For GRM and MGRM, response probabilities P_{jk}(\theta) are defined through cumulative probabilities, while for PCM, GPCM, RSM and NRM they are directly computed.

For GRM and MGRM, set P_{jk}^*(\theta) as the (cumulative) probability of asnwering response category k or "above", that is P_{jk}^*(\theta) = Pr(X_j \geq k | \theta) where X_j is the item response. It follows obviously that for any \theta, P_{j0}^*(\theta) = 1 and P_{jk}^*(\theta) = 0 when k>g_j. Furthermore, response category probabilities are found back by the relationship P_{jk}(\theta)= P_{jk}^*(\theta)-P_{j,k+1}^*(\theta). Then, the GRM is defined by (Samejima, 1969)

P_{jk}^*(\theta)=\frac{\exp\,[\alpha_j\,(\theta-\beta_{jk})]}{1+\exp\,[\alpha_j\,(\theta-\beta_{jk})]}

and the MGRM by (Muraki, 1990)

P_{jk}^*(\theta)=\frac{\exp\,[\alpha_j\,(\theta-b_j+c_k)]}{1+\exp\,[\alpha_j\,(\theta-b_j+c_k)]}.

The PCM, GPCM, RSM and NRM are defined as "divide-by-total" models (Embretson and Reise, 2000). The PCM has following response category probability (Masters, 1982):

P_{jk}(\theta)=\frac{\exp\,\sum_{t=0}^k (\theta-\delta_{jt})}{\sum_{r=0}^{g_j}\,\exp\, \sum_{t=0}^r (\theta-\delta_{jt})}\quad \mbox{with} \quad \sum_{t=0}^0 (\theta-\delta_{jt})=0.

The GPCM has following response category probability (Muraki, 1992):

P_{jk}(\theta)=\frac{\exp\,\sum_{t=0}^k \alpha_j\,(\theta-\delta_{jt})}{\sum_{r=0}^{g_j}\,\exp\, \sum_{t=0}^r \alpha_j\,(\theta-\delta_{jt})}\quad \mbox{with} \quad \sum_{t=0}^0 \alpha_j\,(\theta-\delta_{jt})=0.

The RSM has following response category probability (Andrich, 1978):

P_{jk}(\theta)=\frac{\exp\,\sum_{t=0}^k [\theta-(\lambda_j+\delta_t)]}{\sum_{r=0}^{g_j}\,\exp\, \sum_{t=0}^r [\theta-(\lambda_j+\delta_t)]}\quad \mbox{with} \quad \sum_{t=0}^0 [\theta-(\lambda_j+\delta_t)]=0.

Finally, the NRM has following response category probability (Bock, 1972):

P_{jk}(\theta)=\frac{\exp (\alpha_{jk}\,\theta+c_{jk})}{\sum_{r=0}^{g_j} \exp (\alpha_{jr}\,\theta+c_{jr})}\quad \mbox{with} \quad \alpha_{j0}\,\theta+c_{j0}=0.

The following parent distributions are considered to generate the different item parameters. The \alpha_j parameters of GRM, MGRM and GPCM, as well as the \alpha_{jk} parameters of the NRM, are drawn from a log-normal distribution with mean 0 and standard deviation 0.1225. All other parameters are drawn from a standard normal distribution. Moreover, the \beta_{jk} parameters of the GRM and the c_k parameters of the MGRM are sorted respectively in increasing and decreasing order of k, to ensure decreasing trend in the cumulative P_{jk}^*(\theta) probabilities.

The output is a matrix with one row per item and as many columns as required to hold all item parameters. In case of missing response categories, the corresponding parameters are replaced by NA values. Column names refer to the corresponding model parameters. See Details for further explanations and Examples for illustrative examples.

Finally, the output matrix can contain an additional vector with the names of the subgroups to be used for content balancing purposes. To do so, the argument cbControl (with default value is NULL) must contain a list of two elements: (a) the names element with the names of the subgroups, and (b) the props elements with proportions of items per subgroup (of the same length of names element, with only positive numbers but not necessarily summing to one). The cbControl argument is similar to the one in nextItem and randomCAT functions to control for content balancing. The output matrix contains then an additional column, with the names of the subgroups randomly allocated to each item by using random multinomial draws with the probabilities given by cbControl$props.

Value

A matrix with items rows and as many columns as required for the considered IRT model:

\max_j \,g_j+1 columns, holding parameters (\alpha_j, \beta_{j1}, ..., \beta_{j,g_j}) if model is "GRM";
g+2 columns, holding parameters (\alpha_j, b_j, c_1, ..., c_g) if model is "MGRM";
\max_j \,g_j columns, holding parameters (\delta_{j1}, ..., \delta_{j,g_j}) if model is "PCM";
\max_j \,g_j+1 columns, holding parameters (\alpha_j, \delta_{j1}, ..., \delta_{j,g_j}) if model is "GPCM";
g+1 columns, holding parameters (\lambda_j, \delta_1, ..., \delta_g) if model is "RSM";
2\,\max_j\, g_j columns, holding parameters (\alpha_{j1}, c_{j1}, \alpha_{j2}, c_{j2}, ..., \alpha_{j,g_j}, c_{j, g_j}) if model is "NRM".

If cbControl is not NULL, the output matrix contains an additional colum for item membership is included.

Author(s)

David Magis
Department of Psychology, University of Liege, Belgium
david.magis@uliege.be

References

Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561-573. doi: 10.1007/BF02293814

Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 29-51. doi: 10.1007/BF02291411

Embretson, S. E., and Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum Associates.

Magis, D. and Barrada, J. R. (2017). Computerized Adaptive Testing with R: Recent Updates of the Package catR. Journal of Statistical Software, Code Snippets, 76(1), 1-18. doi: 10.18637/jss.v076.c01

Magis, D., and Raiche, G. (2012). Random Generation of Response Patterns under Computerized Adaptive Testing with the R Package catR. Journal of Statistical Software, 48 (8), 1-31. doi: 10.18637/jss.v048.i08

Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174. doi: 10.1007/BF02296272

Muraki, E. (1990). Fitting a polytomous item response model to Likert-type data. Applied Psychological Measurement, 14, 59-71. doi: 10.1177/014662169001400106

Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 19-176. doi: 10.1177/014662169201600206

Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph (vol. 17).

Examples


# All generated item banks have 10 items and at most four response categories

 # GRM 
 genPolyMatrix(10, 4, model = "GRM")

 # GRM with same number of response categories
 genPolyMatrix(10, 4, model = "GRM", same.nrCat = TRUE)

 # MGRM 
 genPolyMatrix(10, 4, model = "MGRM")

 # MGRM with same number of response categories
 genPolyMatrix(10, 4, model = "MGRM", same.nrCat = TRUE) # same result

 # PCM 
 genPolyMatrix(10, 4, model = "PCM")

 # PCM with same number of response categories
 genPolyMatrix(10, 4, model = "PCM", same.nrCat = TRUE) 

 # GPCM 
 genPolyMatrix(10, 4, model = "GPCM")

 # GPCM with same number of response categories
 genPolyMatrix(10, 4, model = "GPCM", same.nrCat = TRUE) 

 # RSM 
 genPolyMatrix(10, 4, model = "RSM")

 # RSM with same number of response categories
 genPolyMatrix(10, 4, model = "RSM", same.nrCat = TRUE) # same result

 # NRM 
 genPolyMatrix(10, 4, model = "NRM")

 # NRM with same number of response categories
 genPolyMatrix(10, 4, model = "NRM", same.nrCat = TRUE)  

## Content balancing

 # Creation of the 'cbList' list with arbitrary proportions
 cbList <- list(names = c("Audio1", "Audio2", "Written1", "Written2", "Written3"), 
        props = c(0.1, 0.2, 0.2, 0.2, 0.3))

 # NRM with 100 items
 genPolyMatrix(100, 4, model = "NRM", cbControl = cbList)

[Package catR version 3.17 Index]