est_score {irtQ}R Documentation

Estimate examinees' ability (proficiency) parameters

Description

This function estimates examinees' latent ability parameters. Available scoring methods are maximum likelihood estimation (ML), maximum likelihood estimation with fences (MLF; Han, 2016), weighted likelihood estimation (Warm, 1989), maximum a posteriori estimation (MAP; Hambleton et al., 1991), expected a posteriori estimation (EAP; Bock & Mislevy, 1982), EAP summed scoring (Thissen et al., 1995; Thissen & Orlando, 2001), and inverse test characteristic curve (TCC) scoring (e.g., Kolen & Brennan, 2004; Kolen & Tong, 2010; Stocking, 1996).

Usage

est_score(x, ...)

## Default S3 method:
est_score(
  x,
  data,
  D = 1,
  method = "ML",
  range = c(-5, 5),
  norm.prior = c(0, 1),
  nquad = 41,
  weights = NULL,
  fence.a = 3,
  fence.b = NULL,
  tol = 1e-04,
  max.iter = 100,
  se = TRUE,
  stval.opt = 1,
  intpol = TRUE,
  range.tcc = c(-7, 7),
  missing = NA,
  ncore = 1,
  ...
)

## S3 method for class 'est_irt'
est_score(
  x,
  method = "ML",
  range = c(-5, 5),
  norm.prior = c(0, 1),
  nquad = 41,
  weights = NULL,
  fence.a = 3,
  fence.b = NULL,
  tol = 1e-04,
  max.iter = 100,
  se = TRUE,
  stval.opt = 1,
  intpol = TRUE,
  range.tcc = c(-7, 7),
  missing = NA,
  ncore = 1,
  ...
)

Arguments

x

A data frame containing the item metadata (e.g., item parameters, number of categories, models ...) or an object of class est_irt obtained from the function est_irt. See irtfit, info, or simdat for more details about the item metadata. This data frame can be easily obtained using the function shape_df.

...

additional arguments to pass to parallel::makeCluster.

data

A matrix or vector containing examinees' response data for the items in the argument x. When a matrix is used, a row and column indicate the examinees and items, respectively. When a vector is used, it should contains the item response data for an examinee.

D

A scaling factor in IRT models to make the logistic function as close as possible to the normal ogive function (if set to 1.7). Default is 1.

method

A character string indicating a scoring method. Available methods are "ML" for the maximum likelihood estimation, "MLF" for the maximum likelihood estimation with fences, "WL" for the weighted likelihood estimation, "MAP" for the maximum a posteriori estimation, "EAP" for the expected a posteriori estimation, "EAP.SUM" for the expected a posteriori summed scoring, and "INV.TCC" for the inverse TCC scoring. Default method is "ML".

range

A numeric vector of two components to restrict the range of ability scale for the ML, MLF, WL, and MAP scoring methods. Default is c(-5, 5).

norm.prior

A numeric vector of two components specifying a mean and standard deviation of the normal prior distribution. These two parameters are used to obtain the gaussian quadrature points and the corresponding weights from the normal distribution. Default is c(0,1). Ignored if method is "ML", "MLF", "WL", or "INV.TCC".

nquad

An integer value specifying the number of gaussian quadrature points from the normal prior distribution. Default is 41. Ignored if method is "ML", "MLF", "WL", "MAP", or "INV.TCC".

weights

A two-column matrix or data frame containing the quadrature points (in the first column) and the corresponding weights (in the second column) of the latent variable prior distribution. The weights and quadrature points can be easily obtained using the function gen.weight. If NULL and method is "EAP" or "EAP.SUM", default values are used (see the arguments of norm.prior and nquad). Ignored if method is "ML", "MLF", "WL", "MAP", or "INV.TCC".

fence.a

A numeric value specifying the item slope parameter (i.e., a-parameter) for the two imaginary items in MLF. See below for details. Default is 3.0.

fence.b

A numeric vector of two components specifying the lower and upper fences of item difficulty parameters (i.e., b-parameters) for the two imaginary items, respectively, in MLF. When fence.b = NULL, the range values were used to set the lower and upper fences of item difficulty parameters. Default is NULL.

tol

A numeric value of the convergent tolerance for the ML, MLF, WL, MAP, and inverse TCC scoring methods. For the ML, MLF, WL, and MAP, Newton Raphson method is implemented for optimization. For the inverse TCC scoring, the bisection method is used. Default is 1e-4.

max.iter

An positive integer value specifying the maximum number of iterations of Newton Raphson method. Default is 100.

se

A logical value. If TRUE, the standard errors of ability estimates are computed. However, if method is "EAP.SUM" or "INV.TCC", the standard errors are always returned. Default is TRUE.

stval.opt

An positive integer value specifying the starting value option for the ML, MLF, WL, and MAP scoring methods. Available options are 1 for the brute-force method, 2 for the observed sum score-based method, and 3 for setting to 0. Default is 1. See below for details.

intpol

A logical value. If TRUE and method = "INV.TCC", linear interpolation method is used to approximate the ability estimates corresponding to the observed sum scores in which ability estimates cannot be obtained using the TCC (e.g., observed sum scores less than the sum of item guessing parameters). Default is TRUE. See below for details.

range.tcc

A numeric vector of two components to be used as the lower and upper bounds of ability estimates when method = "INV.TCC". Default is c(-7, 7).

missing

A value indicating missing values in the response data set. Default is NA. See below for details.

ncore

The number of logical CPU cores to use. Default is 1. See below for details.

Details

For MAP scoring method, only the normal prior distribution is available for the population distribution.

When there are missing data in the response data set, the missing value must be specified in missing. The missing data are taken into account when either of ML, MLF, WL, MAP, and EAP is used. When "EAP.SUM" or "INV.TCC" is used, however, any missing responses are replaced with incorrect responses (i.e., 0s).

In the maximum likelihood estimation with fences (MLF; Han, 2016), two 2PLM imaginary items are necessary. The first imaginary item serves as the lower fence and its difficulty parameter (i.e., b-parameters) should be lower than any difficulty parameter values in the test form. Likewise, the second imaginary item serves as the upper fence and its difficulty parameter should be greater than any difficulty parameter values in the test form. Also, the two imaginary items should have a very high item slope parameter (i.e., a-parameter) value. See Han (2016) for more details. When fence.b = NULL in MLF, the function automatically sets the lower and upper fences of item difficulty parameters using the values in the range argument.

When "INV.TCC" method is used employing the IRT 3-parameter logistic model (3PLM) in a test, ability estimates for the observed sum scores less than the sum of item guessing parameters are not attainable. In this case, linear interpolation can be applied by setting intpol = TRUE. Let \theta_{min} and \theta_{max} be the minimum and maximum ability estimates and \theta_{X} be the ability estimate for the smallest observed sum score, X, but greater than or equal to the sum of item guessing parameters. When linear interpolation method is used, the first value of the range.tcc is set to \theta_{min}. Then, a linear line is constructed between two points of (x=\theta_{min}, y=0) and (x=\theta_{X}, y=X). Also, the first value of the range.tcc is set to \theta_{max}, which is the ability estimates corresponding to the maximum observed sum score. When it comes to the scoring method of "INV.TCC", the standard errors of ability estimates are computed using an approach suggested by Lim, Davey, and Wells (2020). The code for the inverse TCC scoring was written by modifying the function irt.eq.tse of the SNSequate R package (González, 2014).

In terms of the starting value to be used for ML, MLF, WL, and MAP scoring methods, the brute-force method is used when stval.opt = 1. With this option, the log-likelihood values were evaluated at the discrete theta values with increments of 0.1 given range. The theta node that has the largest log-likelihood is used as the starting value. when stval.opt = 2, the starting value is obtained based on the observed sum score. For example, if the maximum observed sum score (max.score) is 30 for a test and an examinee has an observed sum score of 20 (obs.score), then the starting value is "log(obs.score / (max.score - obs.score))". For all incorrect response, the starting value is "log(1 / max.score)" and for all correct responses, it is "log(max.score / 1)".

To speed up the ability estimation for ML, MLF, WL, MAP, and EAP methods, this function applies a parallel process using multiple logical CPU cores. You can set the number of logical CPU cores by specifying a positive integer value in the argument ncore. Default value is 1.

Note that the standard errors of ability estimates are computed using the Fisher expected information for ML, MLF, WL, and MAP methods.

To implement WL method, the Pi, Ji, and Ii functions of catR (Magis & Barrada, 2017) were referred.

Value

When method is either of "ML", "MLF", "WL", "MAP", or "EAP", a two column data frame including the ability estimates (1st column) and the standard errors of ability estimates (2nd column) is returned. When method is "EAP.SUM" or "INV.TCC", a list of two internal objects are returned. The first object is a three column data frame including the observed sum scores (1st column), the ability estimates (2nd column), and the standard errors of ability estimates (3rd column). The second object is a score table including the possible raw sum scores and corresponding ability and standard error estimates.

Methods (by class)

Author(s)

Hwanggyu Lim hglim83@gmail.com

References

Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Psychometrika, 35, 179-198.

González, J. (2014). SNSequate: Standard and nonstandard statistical models and methods for test equating. Journal of Statistical Software, 59, 1-30.

Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991).Fundamentals of item response theory. Newbury Park, CA: Sage.

Han, K. T. (2016). Maximum likelihood score estimation method with fences for short-length tests and computerized adaptive tests. Applied psychological measurement, 40(4), 289-301.

Howard, J. P. (2017). Computational methods for numerical analysis with R. New York: Chapman and Hall/CRC.

Kolen, M. J. & Brennan, R. L. (2004). Test Equating, Scaling, and Linking (2nd ed.). New York: Springer

Kolen, M. J. & Tong, Y. (2010). Psychometric properties of IRT proficiency estimates. Educational Measurement: Issues and Practice, 29(3), 8-14.

Lim, H., Davey, T., & Wells, C. S. (2020). A recursion-based analytical approach to evaluate the performance of MST. Journal of Educational Measurement. DOI: 10.1111/jedm.12276.

Magis, D., & Barrada, J. R. (2017). Computerized adaptive testing with R: Recent updates of the package catR. Journal of Statistical Software, 76, 1-19.

Stocking, M. L. (1996). An alternative method for scoring adaptive tests. Journal of Educational and Behavioral Statistics, 21(4), 365-389.

Thissen, D. & Orlando, M. (2001). Item response theory for items scored in two categories. In D. Thissen & H. Wainer (Eds.), Test scoring (pp.73-140). Mahwah, NJ: Lawrence Erlbaum.

Thissen, D., Pommerich, M., Billeaud, K., & Williams, V. S. (1995). Item Response Theory for Scores on Tests Including Polytomous Items with Ordered Responses. Applied Psychological Measurement, 19(1), 39-49.

Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427-450.

See Also

irtfit, info, simdat, shape_df, gen.weight

Examples

## the use of a "-prm.txt" file obtained from a flexMIRT
flex_prm <- system.file("extdata", "flexmirt_sample-prm.txt", package = "irtQ")

# read item parameters and transform them to item metadata
x <- bring.flexmirt(file=flex_prm, "par")$Group1$full_df

# generate examinees abilities
set.seed(12)
theta <- rnorm(10)

# simulate the item response data
data <- simdat(x, theta, D=1)


# estimate the abilities using ML
est_score(x, data, D=1, method="ML", range=c(-4, 4), se=TRUE)

# estimate the abilities using WL
est_score(x, data, D=1, method="WL", range=c(-4, 4), se=TRUE)

# estimate the abilities using MLF with default fences of item difficulty parameters
est_score(x, data, D=1, method="MLF", fence.a=3.0, fence.b=NULL, se=TRUE)

# estimate the abilities using MLF with different fences of item difficulty parameters
est_score(x, data, D=1, method="MLF", fence.a=3.0, fence.b=c(-7, 7), se=TRUE)

# estimate the abilities using MAP
est_score(x, data, D=1, method="MAP", norm.prior=c(0, 1), nquad=30, se=TRUE)

# estimate the abilities using EAP
est_score(x, data, D=1, method="EAP", norm.prior=c(0, 1), nquad=30, se=TRUE)

# estimate the abilities using EAP summed scoring
est_score(x, data, D=1, method="EAP.SUM", norm.prior=c(0, 1), nquad=30)

# estimate the abilities using inverse TCC scoring
est_score(x, data, D=1, method="INV.TCC", intpol=TRUE, range.tcc=c(-7, 7))




[Package irtQ version 0.2.0 Index]