catIrt {catIrt} | R Documentation |
Simulate Computerized Adaptive Tests (CATs)
Description
catIrt
simulates Computerized Adaptive Tests (CATs) given a vector/matrix of
responses or a vector of ability values, a matrix of item parameters, and several
item selection mechanisms, estimation procedures, and termination criteria.
Usage
catIrt( params, mod = c("brm", "grm"),
resp = NULL,
theta = NULL,
catStart = list( n.start = 5, init.theta = 0,
select = c("UW-FI", "LW-FI", "PW-FI",
"FP-KL", "VP-KL", "FI-KL", "VI-KL",
"random"),
at = c("theta", "bounds"),
it.range = NULL, n.select = 1,
delta = .1,
score = c("fixed", "step", "random", "WLE", "BME", "EAP"),
range = c(-1, 1),
step.size = 3, leave.after.MLE = FALSE ),
catMiddle = list( select = c("UW-FI", "LW-FI", "PW-FI",
"FP-KL", "VP-KL", "FI-KL", "VI-KL",
"random"),
at = c("theta", "bounds"),
it.range = NULL, n.select = 1,
delta = .1,
score = c("MLE", "WLE", "BME", "EAP"),
range = c(-6, 6),
expos = c("none", "SH") ),
catTerm = list( term = c("fixed", "precision", "info", "class"),
score = c("MLE", "WLE", "BME", "EAP"),
n.min = 5, n.max = 50,
p.term = list(method = c("threshold", "change"),
crit = .25),
i.term = list(method = c("threshold", "change"),
crit = 2),
c.term = list(method = c("SPRT", "GLR", "CI"),
bounds = c(-1, 1),
categ = c(0, 1, 2),
delta = .1,
alpha = .05, beta = .05,
conf.lev = .95) ),
ddist = dnorm,
progress = TRUE, ... )
## S3 method for class 'catIrt'
summary( object, group = TRUE, ids = "none", ... )
## S3 method for class 'catIrt'
plot( x, which = "all", ids = "none",
conf.lev = .95, legend = TRUE, ask = TRUE, ... )
Arguments
object , x |
a |
params |
numeric: a matrix of item parameters. If specified as a matrix,
the rows must index the items, and the columns must designate the item
parameters. For the binary response model, |
mod |
character: a character string indicating the IRT model. Current support
is for the 3-parameter binary response model ("brm"),
and Samejima's graded response model ("grm"). The contents
of |
resp |
numeric: either a |
theta |
numeric: either a |
catStart |
list: a list of options for starting the CAT including:
|
catMiddle |
list: a list of options for selecting/scoring during the middle of the CAT, including:
|
catTerm |
list: a list of options for stopping/terminating the CAT, including:
|
ddist |
function: a function indicating how to calculate prior densities
for Bayesian estimation or particular item selection methods. For instance,
if you wish to specify a normal prior, |
which |
numeric: a scalar or vector of integers between 1 and 4, indicating which plots to include. The plots are as follows:
|
group |
logical: TRUE or FALSE indicating whether to display a summary at the group level. |
ids |
numeric: a scalar or vector of integers between 1 and the number of
simulees indicating which simulees to plot and/or summarize their CAT
process and all of their |
conf.lev |
numeric: a scalar between 0 and 1 indicating the desired confidence
level plotted for the individual |
legend |
logical: TRUE or FALSE indicating whether the plot function should display a legend on the plot. |
ask |
logical: TRUE or FALSE indicating whether the plot function should ask between plots. |
progress |
logical: TRUE or FALSE indicating whether the |
... |
arguments passed to |
Details
The function catIrt
performs a post-hoc computerized adaptive test (CAT),
with a variety of user specified inputs. For a given person/simulee (e.g. simulee i
),
a CAT represents a simple set of stages surrounded by a while
loop
(e.g. Weiss and Kingsbury, 1984):
Item Selection: The next item is chosen based on a pre-specified criterion/criteria. For example, the classic item selection mechanism is picking an item such that it maximizes Fisher Information at the current estimate of
\theta_i
. Frequently, content balancing, item constraints, or item exposure will be taken into consideration at this point (aside from solely picking the "best item" for a given person). SeeitChoose
for current item selection methods.Estimation:
\theta_i
is estimated based on updated information, usually relating to the just-selected item and the response associated with that item. In a post-hoc CAT, all of the responses already exist, but in a standard CAT, "item administration" would be between "item selection" and "estimation." The classic estimation mechanism is estimating\theta_i
based off of maximizing the likelihood given parameters and a set of responses. Other estimation mechanisms correct for bias in the maximum likelihood estimate or add a prior information (such as a prior distribution of\theta
). If an estimate is untenable (i.e. it returns a non-sensical value or\infty
), the estimation procedure needs to have an alternative estimation mechanism. SeemleEst
for current estimation methods.Termination: Either the test is terminated based on a pre-specified criterion/critera, or no termination criteria is satisfied, in which case the loop repeats. The standard termination criteria involve a fixed criterion (e.g. administering only 50 items), or a variable criterion (e.g. continuing until the observed SEM is below .3). Other termination criteria relate to cut-point tests (e.g. certification tests, classification tests), that depend not solely on ability but on whether that ability is estimated to exceed a threshold.
catIrt
terminates classification tests based on either the Sequential Probability Ratio Test (SPRT) (see Eggen, 1999), the Generalized Likelihood Ratio (GLR) (see Thompson, 2009), or the Confidence Interval Method (see Kingsbury & Weiss, 1983). Essentially, the SPRT compares the ratio of two likelihoods (e.g. the likelihood of the data given being in one category vs the likelihood of the data given being in the other category, as defined byB + \delta
andB - \delta
(whereB
separates the categories and\delta
is the halfwidth of the indifference region) and compares that ratio with a ratio of error rates (\alpha
and\beta
) (see Wald, 1945). The GLR uses the maximum likelihood estimate in place of eitherB + \delta
orB - \delta
, and the confidence interval method terminates a CAT if the confidence interval surrounding an estimate of\theta
is fully within one of the categories.
The CAT estimates \theta_{i1}
(an initial point) based on init.theta
,
and terminates the entire simulation after sequentially terminating each simulee's CAT.
Value
The function catIrt
returns a list (of class "catIrt") with the following elements:
cat_theta |
a vector of final CAT |
cat_categ |
a vector indicating the final classification of each simulee in the CAT. If
|
cat_info |
a vector of observed Fisher information based on the final CAT |
cat_sem |
a vector of observed SEM estimates (or posterior standard deviations) based on the
final CAT |
cat_length |
a vector indicating the number of items administered to each simulee in the CAT |
cat_term |
a vector indicating how each CAT was terminated. |
tot_theta |
a vector of |
tot_categ |
a vector indicating the classification of each simulee given the entire item bank. |
tot_info |
a vector of observed Fisher information based on the entire item bank worth of responses. |
tot_sem |
a vector of observed SEM estimates based on the entire item bank worth of responses. |
true_theta |
a vector of true |
true_categ |
a vector of true classification given |
full_params |
the full item bank. |
full_resp |
the full set of responses. |
cat_indiv |
a list of |
mod |
a list of model specifications, as designated by the user, so that the CAT can be easily reproduced. |
Note
Both summary.catIrt
and plot.catIrt
return different objects than the original
catIrt
function. summary.catIrt
returns summary labeled summary statistics, and
plot.catIrt
returns evaluation points (x
values, information, and SEM) for each
of the plots. Moreover, if in interactive mode and missing parts of the catStart
, catMiddle
,
or catTerm
arguments, the catIrt
function will interactively ask for each of those
and return the set of arguments in the "catIrt" object.
Author(s)
Steven W. Nydick swnydick@gmail.com
References
Eggen, T. J. H. M. (1999). Item selection in adaptive testing with the sequential probability ratio test. Applied Psychological Measurement, 23, 249 – 261.
Kingsbury, G. G., & Weiss (1983). A comparison of IRT-based adaptive mastery testing and a sequential mastery testing procedure. In D. J. Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 257–283). New York, NY: Academic Press.
Thompson, N. A. (2009). Using the generalized likelihood ratio as a termination criterion. In D. J. Weiss (Ed.), Proceedings of the 2009 GMAC conference on computerized adaptive testing.
Wainer, H. (Ed.). (2000). Computerized Adaptive Testing: A Primer (2nd Edition). Mahwah, NJ: Lawrence Erlbaum Associates.
Wald, A. (1945). Sequential tests of statistical hypotheses. Annals of Mathematical Statistics, 16, 117 – 186.
Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21, 361-375.
See Also
FI
, itChoose
, KL
, mleEst
,
simIrt
Examples
## Not run:
#########################
# Binary Response Model #
#########################
set.seed(888)
# generating random theta:
theta <- rnorm(50)
# generating an item bank under a 2-parameter binary response model:
b.params <- cbind(a = runif(100, .5, 1.5), b = rnorm(100, 0, 2), c = 0)
# simulating responses:
b.resp <- simIrt(theta = theta, params = b.params, mod = "brm")$resp
## CAT 1 ##
# the typical, classic post-hoc CAT:
catStart1 <- list(init.theta = 0, n.start = 5,
select = "UW-FI", at = "theta",
n.select = 4, it.range = c(-1, 1),
score = "step", range = c(-1, 1),
step.size = 3, leave.after.MLE = FALSE)
catMiddle1 <- list(select = "UW-FI", at = "theta",
n.select = 1, it.range = NULL,
score = "MLE", range = c(-6, 6),
expos = "none")
catTerm1 <- list(term = "fixed", n.min = 10, n.max = 50)
cat1 <- catIrt(params = b.params, mod = "brm",
resp = b.resp,
catStart = catStart1,
catMiddle = catMiddle1,
catTerm = catTerm1)
# we can print, summarize, and plot:
cat1 # prints theta because
# we have fewer than
# 200 simulees
summary(cat1, group = TRUE, ids = "none") # nice summary!
summary(cat1, group = FALSE, ids = 1:4) # summarizing people too! :)
par(mfrow = c(2, 2))
plot(cat1, ask = FALSE) # 2-parameter model, so expected FI
# and observed FI are the same
par(mfrow = c(1, 1))
# we can also plot particular simulees:
par(mfrow = c(2, 1))
plot(cat1, which = "none", ids = c(1, 30), ask = FALSE)
par(mfrow = c(1, 1))
## CAT 2 ##
# using Fixed Point KL info rather than Unweighted FI to select items:
catStart2 <- catStart1
catMiddle2 <- catMiddle1
catTerm2 <- catTerm1
catStart2$leave.after.MLE <- TRUE # leave after mixed response pattern
catMiddle2$select <- "FP-KL"
catMiddle2$at <- "bounds"
catMiddle2$delta <- .2
catTerm2$c.term <- list(bounds = 0)
cat2 <- catIrt(params = b.params, mod = "brm",
resp = b.resp,
catStart = catStart2,
catMiddle = catMiddle2,
catTerm = catTerm2)
cor(cat1$cat_theta, cat2$cat_theta) # very close!
summary(cat2, group = FALSE, ids = 1:4) # rarely 5 starting items!
## CAT 3/4 ##
# using "precision" rather than "fixed" to terminate:
catTerm1$term <- catTerm2$term <- "precision"
catTerm1$p.term <- catTerm2$p.term <- list(method = "threshold", crit = .3)
cat3 <- catIrt(params = b.params, mod = "brm",
resp = b.resp,
catStart = catStart1,
catMiddle = catMiddle1,
catTerm = catTerm1)
cat4 <- catIrt(params = b.params, mod = "brm",
resp = b.resp,
catStart = catStart2,
catMiddle = catMiddle2,
catTerm = catTerm2)
mean(cat3$cat_length - cat4$cat_length) # KL info results in slightly more items
## CAT 5/6 ##
# classification CAT with a boundary of 0 (with default classification stuff):
catTerm5 <- list(term = "class", n.min = 10, n.max = 50,
c.term = list(method = "SPRT",
bounds = 0, delta = .2,
alpha = .10, beta = .10))
cat5 <- catIrt(params = b.params, mod = "brm",
resp = b.resp,
catStart = catStart1,
catMiddle = catMiddle1,
catTerm = catTerm5)
cat6 <- catIrt(params = b.params, mod = "brm",
resp = b.resp,
catStart = catStart1,
catMiddle = catMiddle2,
catTerm = catTerm5)
# how many were classified correctly?
mean(cat5$cat_categ == cat5$tot_categ)
# using a different selection mechanism, we get the similar results:
mean(cat6$cat_categ == cat6$tot_categ)
## CAT 7 ##
# we could change estimation to EAP with the default (normal) prior:
catMiddle7 <- catMiddle1
catMiddle7$score <- "EAP"
cat7 <- catIrt(params = b.params, mod = "brm", # much slower!
resp = b.resp,
catStart = catStart1,
catMiddle = catMiddle7,
catTerm = catTerm1)
cor(cat1$cat_theta, cat7$cat_theta) # pretty much the same
## CAT 8 ##
# let's specify the prior as something strange:
cat8 <- catIrt(params = b.params, mod = "brm",
resp = b.resp,
catStart = catStart1,
catMiddle = catMiddle7,
catTerm = catTerm1,
ddist = dchisq, df = 4)
cat8 # all positive values of "theta"
## CAT 9 ##
# finally, we can have:
# - more than one termination criteria,
# - individual bounds per person,
# - simulating based on theta without a response matrix.
catTerm9 <- list(term = c("fixed", "class"),
n.min = 10, n.max = 50,
c.term = list(method = "SPRT",
bounds = cbind(runif(length(theta), -1, 0),
runif(length(theta), 0, 1)),
delta = .2,
alpha = .1, beta = .1))
cat9 <- catIrt(params = b.params, mod = "brm",
resp = NULL, theta = theta,
catStart = catStart1,
catMiddle = catMiddle1,
catTerm = catTerm9)
summary(cat9) # see "... with Each Termination Criterion"
#########################
# Graded Response Model #
#########################
# generating random theta
theta <- rnorm(201)
# generating an item bank under a graded response model:
g.params <- cbind(a = runif(100, .5, 1.5), b1 = rnorm(100), b2 = rnorm(100),
b3 = rnorm(100), b4 = rnorm(100))
# the graded response model is exactly the same, only slower!
cat10 <- catIrt(params = g.params, mod = "grm",
resp = NULL, theta = theta,
catStart = catStart1,
catMiddle = catMiddle1,
catTerm = catTerm1)
# warning because it.range cannot be specified for graded response models!
# if there is more than 200 simulees, it doesn't print individual thetas:
cat10
## End(Not run)
# play around with things - CATs are fun - a little frisky, but fun.