R: Simulate Computerized Adaptive Tests (CATs)

catIrt {catIrt}

R Documentation

Simulate Computerized Adaptive Tests (CATs)

Description

catIrt simulates Computerized Adaptive Tests (CATs) given a vector/matrix of responses or a vector of ability values, a matrix of item parameters, and several item selection mechanisms, estimation procedures, and termination criteria.

Usage

catIrt( params, mod = c("brm", "grm"),
        resp = NULL,
        theta = NULL,
        catStart = list( n.start = 5, init.theta = 0,
                         select = c("UW-FI", "LW-FI", "PW-FI",
                                    "FP-KL", "VP-KL", "FI-KL", "VI-KL",
                                    "random"),
                         at = c("theta", "bounds"),
                         it.range = NULL, n.select = 1,
                         delta = .1,
                         score = c("fixed", "step", "random", "WLE", "BME", "EAP"),
                         range = c(-1, 1),
                         step.size = 3, leave.after.MLE = FALSE ),
        catMiddle = list( select = c("UW-FI", "LW-FI", "PW-FI",
                                     "FP-KL", "VP-KL", "FI-KL", "VI-KL",
                                     "random"),
                          at = c("theta", "bounds"),
                          it.range = NULL, n.select = 1,
                          delta = .1,
                          score = c("MLE", "WLE", "BME", "EAP"),
                          range = c(-6, 6),
                          expos = c("none", "SH") ),
        catTerm = list( term = c("fixed", "precision", "info", "class"),
                        score = c("MLE", "WLE", "BME", "EAP"),
                        n.min = 5, n.max = 50,
                        p.term = list(method = c("threshold", "change"),
                                      crit = .25),
                        i.term = list(method = c("threshold", "change"),
                                      crit = 2), 
                        c.term = list(method = c("SPRT", "GLR", "CI"),
                                      bounds = c(-1, 1),
                                      categ = c(0, 1, 2),
                                      delta = .1,
                                      alpha = .05, beta = .05,
                                      conf.lev = .95) ),
        ddist = dnorm,
        progress = TRUE, ... )
## S3 method for class 'catIrt'
summary( object, group = TRUE, ids = "none", ... )
## S3 method for class 'catIrt'
plot( x, which = "all", ids = "none", 
      conf.lev = .95, legend = TRUE, ask = TRUE, ... )

Arguments

`object`, `x`	a `catIrt` object.
`params`	numeric: a matrix of item parameters. If specified as a matrix, the rows must index the items, and the columns must designate the item parameters. For the binary response model, `params` must either be a 3-column matrix (if not using item exposure control), a 4-5-column matrix (with Sympson-Hetter parameters as the last column if using item exposure control), or a 4-5-column matrix (if including the item number as the first column). See Details for more information.
`mod`	character: a character string indicating the IRT model. Current support is for the 3-parameter binary response model ("brm"), and Samejima's graded response model ("grm"). The contents of `params` must match the designation of `mod`. If `mod` is left blank, it will be designated the class of `resp` (if `resp` inherits either "brm" or "grm"), and if that fails, it will ask the user (if in interactive mode) or error.
`resp`	numeric: either a `N \times J` matrix (where `N` indicates the number of simulees and `J` indicates the number of items), a `J` length vector (if there is only one simulee), or NULL if specifying `thetas`. For the binary response model ("brm"), `resp` must solely contain 0s and 1s. For the graded response model ("grm"), `resp` must solely contain integers `1, \ldots, K`, where `K` is the number of categories, as indicated by the dimension of `params`.
`theta`	numeric: either a `N`-dimensional vector (where `N` indicates the number of simulees) or NULL if specifying `resp`.
`catStart`	list: a list of options for starting the CAT including: `n.start`: a scalar indicating the number of items that are used for each simulee at the beginning of the CAT. After n.start reaches the specified value, the CAT will shift to the middle set of parameters. `init.theta`: a scalar or vector of initial starting estimates of `\theta`. If `init.theta` is a scalar, every simulee will have the same starting value. Otherwise, simulees will have different starting values based on the respective element of `init.theta`. `select`: a character string indicating the item selection method for the first few items. Items can be selected either through maximum Fisher information or Kullback-Leibler divergence methods or randomly. The Fisher information methods include "UW-FI": unweighted Fisher information at a point. "LW-FI": Fisher information weighted across the likelihood function. "PW-FI": Fisher information weighted across the posterior distribution of `\theta`. And the Kullback-Leibler divergence methods include "FP-KL": pointwise KL divergence between [P +/- delta], where P is either the current `\theta` estimate or a classification bound. "VP-KL": pointwise KL divergence between [P +/- delta/sqrt(n)], where n is the number of items given to this point in the CAT. "FI-KL": KL divergence integrated along [P -/+ delta] with respect to P "VI-KL": KL divergence integrated along [P -/+ delta/sqrt(n)] with respect to P. See `itChoose` for more information. `at`: a character string indicating where to select items. If `select` is "UW-FI" and `at` is "theta", then items will be selected to maximize Fisher information at the proximate `\theta` estimates. `it.range`: Either a 2-element numeric vector indicating the minimum and maximum allowed difficulty parameters for items selected during the starting portion of the CAT (only if `mod` is equal to "brm") or NULL indicating no item parameter restrictions. See `itChoose` for more information. `n.select`: an integer indicating the number of items to select at one time. For instance, if `select` is "UW-FI", `at` is "theta", and `n.select` is 5, the item choosing function will randomly select between the top 5 items that maximize expected Fisher information at proximate `\theta` estimates. `delta`: a scalar indicating the multiplier used in initial item selection if a Kullback-Leibler method is chosen. `score`: a character string indicating the `\theta` estimation method. As of now, the options for scoring the first few items are "fixed" (at `init.thet`), "step" (by adding or subtracting `step.size` `\theta` estimates after each item), Weighted Likelihood Estimation ("WLE"), Bayesian Modal Estimation ("BME"), and Expected A-Posteriori Estimation ("EAP"). The latter two allow user specified prior distributions through density (`d...`) functions. See `mleEst` for more information. `range`: a 2-element numeric vector indicating the minimum and maximum that `\theta` should be estimated in the starting portion of the CAT. `step.size`: a scalar indicating how much to increment or decrement the estimate of `\theta` if `score` is set to "step". `leave.after.MLE`: a logical indicating whether to skip the remainder of the starting items if the user has a mixed response pattern and/or a finite maximum likelihood estimate of `\theta` can be achieved.
`catMiddle`	list: a list of options for selecting/scoring during the middle of the CAT, including: `select`: a character string indicating the item selection method for the remaining items. See `select` in `catStart` for an explanation of the options. `at`: a character string indicating where to select items. See `select` in `catStart` for an explanation of the options. `it.range`: Either a 2-element numeric vector indicating the minimum and maximum allowed difficulty parameters for items selected during the middle portion of the CAT (only if `mod` is equal to "brm") or NULL indicating no item parameter restrictions. See `itChoose` for more information. `n.select`: an integer indicating the number of items to select at one time. `delta`: a scalar indicating the multiplier used in middle item selection if a Kullback-Leibler method is chosen. `score`: a character string indicating the `\theta` estimation method. As of now, the options for scoring the remaining items are Maximum Likelihood Estimation ("MLE"), Weighted Likelihood Estimation ("WLE"), Bayesian Modal Estimation ("BME"), and Expected A-Posteriori Estimation ("EAP"). The latter two allow user specified prior distributions through density (`d...`) functions. See `mleEst` for more information. `range`: a 2-element numeric vector indicating the minimum and maximum that `\theta` should be estimated in the middle portion of the CAT. `expos`: a character string indicating whether no item exposure controls should be implemented ("none") or whether the CAT should use Sympson-Hetter exposure controls ("SH"). If (and only if) `expos` is equal to "SH", the last column of the parameter matrix should indicate the probability of an item being administered given that it is selected.
`catTerm`	list: a list of options for stopping/terminating the CAT, including: `term`: a scalar/vector indicating the termination criterion/criteria. CATs can be terminated either through a fixed number of items ("fixed") declared through the `n.max` argument; related to SEM of a simulee ("precision") declared through the `p.term` argument; related to the test information of a simulee at a particular point in the cat ("info") declared through the `i.term` argument; and/or when a simulee falls into a category. If more than one termination criteria is selected, the CAT will terminate after successfully satisfying the first of those for a given simulee. `score`: a character string indicating the `\theta` estimation method for all of the responses in the bank. `score` is used to estimate `\theta` given the entire bank of item responses and parameter set. If the theta estimated using all of the responses is far away from `\theta`, the size of the item bank is probably too small. The options for `score` in `catTerm` are identical to the options of `score` in `catMiddle`. `n.min`: an integer indicating the minimum number of items that a simulee should "take" before any of the termination criteria are checked. `n.max`: an integer indicating the maximum number of items to administer before terminating the CAT. `p.term`: a list indicating the parameters of a precision-based stopping rule, only if `term` is "precision", including: `method`: a character string indicating whether to terminate the CAT when the SEM dips below a threshold ("threshold") or changes less than a particular amount ("change"). `crit`: a scalar indicating either the maximum SEM of a simulee before terminating the CAT or the maximum change in the simulee's SEM before terminating the CAT. `i.term`: a list indicating the parameters of a information-based stopping rule, only if `term` is "info", including: `method`: a character string indicating whether to terminate the CAT when FI exceeds a threshold ("threshold") or changes less than a particular amount ("change"). `crit`: a scalar indicating either the minimum FI of a simulee before terminating the CAT or the maximum change in the simulee's FI before terminating the CAT. `c.term`: a list indicating the parameters of a classification CAT, only if `term` is "class" or any of the selection methods are `at` one or more "bounds", including: `method`: a scalar indicating the method used for a classification CAT. As of now, the classification CAT options are the Sequential Probability Ratio Test ("SPRT"), the Generalized Likelihood Ratio ("GLR"), or the Confidence Interval method ("CI"). `bounds`: a scalar, vector, or matrix of classification bounds. If specified as a scalar, there will be one bound for each simulee at that value. If specified as a `N`-dimensional vector, there will be one bound for each simulee. If specified as a `k < N`-dimensional vector, there will be `k` bounds for each simulee at those values. And if specified as a `N \times k`-element matrix, there will be `k` bounds for each simulee. `categ`: a vector indicating the names of the categories into which the simulees should be classified. The length of `categ` should be one greater than the length of `bounds`. `delta`: a scalar indicating the half-width of an indifference region when performing an SPRT-based classification CAT or selecting items by Kullback-Leibler divergence. See Eggen (1999) and `KL` for more information. `alpha`: a scalar indicating the specified Type I error rate for performing an SPRT- based classification CAT. `beta`: a scalar indicating the specified Type II error rate for performing an SPRT- based classification CAT. `conf.lev`: a scalar between 0 and 1 indicating the confidence level used when performing a confidence-based ("CI") classification CAT.
`ddist`	function: a function indicating how to calculate prior densities for Bayesian estimation or particular item selection methods. For instance, if you wish to specify a normal prior, `ddist = dnorm`, and if you wish to specify a uniform prior, `ddist = dunif`. Note that it is standard in R to use `d`... to indicate a density. See `itChoose` for more information.
`which`	numeric: a scalar or vector of integers between 1 and 4, indicating which plots to include. The plots are as follows: Bank Information Bank SEM CAT Information CAT SEM `which` can also be "none", in which case `plot.catIrt` will not plot any information functions, or it can be "all", in which case `plot.catIrt` will plot all four information functions.
`group`	logical: TRUE or FALSE indicating whether to display a summary at the group level.
`ids`	numeric: a scalar or vector of integers between 1 and the number of simulees indicating which simulees to plot and/or summarize their CAT process and all of their `\theta` estimates. `ids` can also be "none" (or, equivalently, NULL) or "all".
`conf.lev`	numeric: a scalar between 0 and 1 indicating the desired confidence level plotted for the individual `\theta` estimates.
`legend`	logical: TRUE or FALSE indicating whether the plot function should display a legend on the plot.
`ask`	logical: TRUE or FALSE indicating whether the plot function should ask between plots.
`progress`	logical: TRUE or FALSE indicating whether the `catIrt` function should display a progress bar during the CAT.
`...`	arguments passed to `ddist` or `plot.catIrt`, usually distribution parameters identified by name or graphical parameters.

Details

The function catIrt performs a post-hoc computerized adaptive test (CAT), with a variety of user specified inputs. For a given person/simulee (e.g. simulee i), a CAT represents a simple set of stages surrounded by a while loop (e.g. Weiss and Kingsbury, 1984):

Item Selection: The next item is chosen based on a pre-specified criterion/criteria. For example, the classic item selection mechanism is picking an item such that it maximizes Fisher Information at the current estimate of \theta_i. Frequently, content balancing, item constraints, or item exposure will be taken into consideration at this point (aside from solely picking the "best item" for a given person). See itChoose for current item selection methods.
Estimation: \theta_i is estimated based on updated information, usually relating to the just-selected item and the response associated with that item. In a post-hoc CAT, all of the responses already exist, but in a standard CAT, "item administration" would be between "item selection" and "estimation." The classic estimation mechanism is estimating \theta_i based off of maximizing the likelihood given parameters and a set of responses. Other estimation mechanisms correct for bias in the maximum likelihood estimate or add a prior information (such as a prior distribution of \theta). If an estimate is untenable (i.e. it returns a non-sensical value or \infty), the estimation procedure needs to have an alternative estimation mechanism. See mleEst for current estimation methods.
Termination: Either the test is terminated based on a pre-specified criterion/critera, or no termination criteria is satisfied, in which case the loop repeats. The standard termination criteria involve a fixed criterion (e.g. administering only 50 items), or a variable criterion (e.g. continuing until the observed SEM is below .3). Other termination criteria relate to cut-point tests (e.g. certification tests, classification tests), that depend not solely on ability but on whether that ability is estimated to exceed a threshold. catIrt terminates classification tests based on either the Sequential Probability Ratio Test (SPRT) (see Eggen, 1999), the Generalized Likelihood Ratio (GLR) (see Thompson, 2009), or the Confidence Interval Method (see Kingsbury & Weiss, 1983). Essentially, the SPRT compares the ratio of two likelihoods (e.g. the likelihood of the data given being in one category vs the likelihood of the data given being in the other category, as defined by B + \delta and B - \delta (where B separates the categories and \delta is the halfwidth of the indifference region) and compares that ratio with a ratio of error rates (\alpha and \beta) (see Wald, 1945). The GLR uses the maximum likelihood estimate in place of either B + \delta or B - \delta, and the confidence interval method terminates a CAT if the confidence interval surrounding an estimate of \theta is fully within one of the categories.

The CAT estimates \theta_{i1} (an initial point) based on init.theta, and terminates the entire simulation after sequentially terminating each simulee's CAT.

Value

The function catIrt returns a list (of class "catIrt") with the following elements:

`cat_theta`	a vector of final CAT `\theta` estimates.
`cat_categ`	a vector indicating the final classification of each simulee in the CAT. If `term` is not "class", `cat_categ` will be a vector of NA values.
`cat_info`	a vector of observed Fisher information based on the final CAT `\theta` estimates and the item responses.
`cat_sem`	a vector of observed SEM estimates (or posterior standard deviations) based on the final CAT `\theta` estimates and the item responses.
`cat_length`	a vector indicating the number of items administered to each simulee in the CAT
`cat_term`	a vector indicating how each CAT was terminated.
`tot_theta`	a vector of `\theta` estimates given the entire item bank.
`tot_categ`	a vector indicating the classification of each simulee given the entire item bank.
`tot_info`	a vector of observed Fisher information based on the entire item bank worth of responses.
`tot_sem`	a vector of observed SEM estimates based on the entire item bank worth of responses.
`true_theta`	a vector of true `\theta` values if specified by the user.
`true_categ`	a vector of true classification given `\theta`.
`full_params`	the full item bank.
`full_resp`	the full set of responses.
`cat_indiv`	a list of `\theta` estimates, observed SEM, observed information, the responses and the parameters chosen for each simulee over the entire CAT.
`mod`	a list of model specifications, as designated by the user, so that the CAT can be easily reproduced.

Note

Both summary.catIrt and plot.catIrt return different objects than the original catIrt function. summary.catIrt returns summary labeled summary statistics, and plot.catIrt returns evaluation points (x values, information, and SEM) for each of the plots. Moreover, if in interactive mode and missing parts of the catStart, catMiddle, or catTerm arguments, the catIrt function will interactively ask for each of those and return the set of arguments in the "catIrt" object.

Author(s)

Steven W. Nydick swnydick@gmail.com

References

Eggen, T. J. H. M. (1999). Item selection in adaptive testing with the sequential probability ratio test. Applied Psychological Measurement, 23, 249 – 261.

Kingsbury, G. G., & Weiss (1983). A comparison of IRT-based adaptive mastery testing and a sequential mastery testing procedure. In D. J. Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 257–283). New York, NY: Academic Press.

Thompson, N. A. (2009). Using the generalized likelihood ratio as a termination criterion. In D. J. Weiss (Ed.), Proceedings of the 2009 GMAC conference on computerized adaptive testing.

Wainer, H. (Ed.). (2000). Computerized Adaptive Testing: A Primer (2nd Edition). Mahwah, NJ: Lawrence Erlbaum Associates.

Wald, A. (1945). Sequential tests of statistical hypotheses. Annals of Mathematical Statistics, 16, 117 – 186.

Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21, 361-375.

Examples

## Not run: 

#########################
# Binary Response Model #
#########################
set.seed(888)
# generating random theta:
theta <- rnorm(50)
# generating an item bank under a 2-parameter binary response model:
b.params <- cbind(a = runif(100, .5, 1.5), b = rnorm(100, 0, 2), c = 0)
# simulating responses:
b.resp <- simIrt(theta = theta, params = b.params, mod = "brm")$resp


## CAT 1 ##
# the typical, classic post-hoc CAT:
catStart1 <- list(init.theta = 0, n.start = 5,
                  select = "UW-FI", at = "theta",
                  n.select = 4, it.range = c(-1, 1),
                  score = "step", range = c(-1, 1),
                  step.size = 3, leave.after.MLE = FALSE)
catMiddle1 <- list(select = "UW-FI", at = "theta",
                   n.select = 1, it.range = NULL,
                   score = "MLE", range = c(-6, 6),
                   expos = "none")
catTerm1 <- list(term = "fixed", n.min = 10, n.max = 50)

cat1 <- catIrt(params = b.params, mod = "brm",
               resp = b.resp,
               catStart = catStart1,
               catMiddle = catMiddle1,
               catTerm = catTerm1)

# we can print, summarize, and plot:
cat1                                        # prints theta because
                                            # we have fewer than
                                            # 200 simulees
summary(cat1, group = TRUE, ids = "none")   # nice summary!

summary(cat1, group = FALSE, ids = 1:4)     # summarizing people too! :)

par(mfrow = c(2, 2))
plot(cat1, ask = FALSE)               # 2-parameter model, so expected FI
                                      # and observed FI are the same
par(mfrow = c(1, 1))

# we can also plot particular simulees:
par(mfrow = c(2, 1))
plot(cat1, which = "none", ids = c(1, 30), ask = FALSE)
par(mfrow = c(1, 1))


## CAT 2 ##
# using Fixed Point KL info rather than Unweighted FI to select items:
catStart2 <- catStart1
catMiddle2 <- catMiddle1
catTerm2 <- catTerm1

catStart2$leave.after.MLE <- TRUE         # leave after mixed response pattern
catMiddle2$select <- "FP-KL"
catMiddle2$at <- "bounds"
catMiddle2$delta <- .2
catTerm2$c.term <- list(bounds = 0)
cat2 <- catIrt(params = b.params, mod = "brm",
               resp = b.resp,
               catStart = catStart2,
               catMiddle = catMiddle2,
               catTerm = catTerm2)
cor(cat1$cat_theta, cat2$cat_theta)       # very close!

summary(cat2, group = FALSE, ids = 1:4)   # rarely 5 starting items!


## CAT 3/4 ##
# using "precision" rather than "fixed" to terminate:
catTerm1$term <- catTerm2$term <- "precision"
catTerm1$p.term <- catTerm2$p.term <- list(method = "threshold", crit = .3)
cat3 <- catIrt(params = b.params, mod = "brm",
               resp = b.resp,
               catStart = catStart1,
               catMiddle = catMiddle1,
               catTerm = catTerm1)
cat4 <- catIrt(params = b.params, mod = "brm",
			   resp = b.resp,
			   catStart = catStart2,
			   catMiddle = catMiddle2,
			   catTerm = catTerm2)

mean(cat3$cat_length - cat4$cat_length) # KL info results in slightly more items


## CAT 5/6 ##
# classification CAT with a boundary of 0 (with default classification stuff):
catTerm5 <- list(term = "class", n.min = 10, n.max = 50,
                 c.term = list(method = "SPRT",
                               bounds = 0, delta = .2,
                               alpha = .10, beta = .10))
cat5 <- catIrt(params = b.params, mod = "brm",
               resp = b.resp,
               catStart = catStart1,
               catMiddle = catMiddle1,
               catTerm = catTerm5)
cat6 <- catIrt(params = b.params, mod = "brm",
               resp = b.resp,
               catStart = catStart1,
               catMiddle = catMiddle2,
               catTerm = catTerm5)

# how many were classified correctly?
mean(cat5$cat_categ == cat5$tot_categ)

# using a different selection mechanism, we get the similar results:
mean(cat6$cat_categ == cat6$tot_categ)


## CAT 7 ##
# we could change estimation to EAP with the default (normal) prior:
catMiddle7 <- catMiddle1
catMiddle7$score <- "EAP"
cat7 <- catIrt(params = b.params, mod = "brm", # much slower!
               resp = b.resp,
               catStart = catStart1,
               catMiddle = catMiddle7,
               catTerm = catTerm1)
cor(cat1$cat_theta, cat7$cat_theta)            # pretty much the same


## CAT 8 ##
# let's specify the prior as something strange:
cat8 <- catIrt(params = b.params, mod = "brm",
               resp = b.resp,
               catStart = catStart1,
               catMiddle = catMiddle7,
               catTerm = catTerm1,
               ddist = dchisq, df = 4)

cat8   # all positive values of "theta"


## CAT 9 ##
# finally, we can have:
#   - more than one termination criteria,
#   - individual bounds per person,
#   - simulating based on theta without a response matrix.
catTerm9 <- list(term = c("fixed", "class"),
                 n.min = 10, n.max = 50,
                 c.term = list(method = "SPRT",
                               bounds = cbind(runif(length(theta), -1, 0),
                                              runif(length(theta), 0, 1)),
                               delta = .2,
                               alpha = .1, beta = .1))
cat9 <- catIrt(params = b.params, mod = "brm",
               resp = NULL, theta = theta,
               catStart = catStart1,
               catMiddle = catMiddle1,
               catTerm = catTerm9)

summary(cat9)   # see "... with Each Termination Criterion"


#########################
# Graded Response Model #
#########################
# generating random theta
theta <- rnorm(201)
# generating an item bank under a graded response model:
g.params <- cbind(a = runif(100, .5, 1.5), b1 = rnorm(100), b2 = rnorm(100),
                                           b3 = rnorm(100), b4 = rnorm(100))

# the graded response model is exactly the same, only slower!
cat10 <- catIrt(params = g.params, mod = "grm",
                resp = NULL, theta = theta,
                catStart = catStart1,
                catMiddle = catMiddle1,
                catTerm = catTerm1)

# warning because it.range cannot be specified for graded response models!

# if there is more than 200 simulees, it doesn't print individual thetas:
cat10


## End(Not run)

# play around with things - CATs are fun - a little frisky, but fun.

[Package catIrt version 0.5.1 Index]