R: Detect preknowledge

detect_pk {aberrance}

R Documentation

Detect preknowledge

Description

Detect preknowledge under the assumption that the set of compromised items is known.

Usage

detect_pk(
  method,
  ci,
  psi,
  xi = NULL,
  xi_c = NULL,
  xi_s = NULL,
  x = NULL,
  y = NULL,
  interval = c(-4, 4),
  alpha = 0.05,
  cutoff = 0.05
)

Arguments

`method`	The preknowledge detection statistic(s) to compute. Options for score-based statistics are: `"L_S"` for the signed likelihood ratio test statistic (Sinharay, 2017). `"ML_S"` for the modified signed likelihood ratio test statistic (Sinharay & Jensen, 2019). For numerical stability, an absolute cutoff value can be specified using `cutoff`. Note: This statistic cannot be computed under the 3PL model or the graded response model. `"LR_S"` for the Lugannani-Rice approximation (Sinharay & Jensen, 2019). For numerical stability, an absolute cutoff value can be specified using `cutoff`. Note: This statistic cannot be computed under the 3PL model or the graded response model. `"S_S"` for the signed score test statistic (Sinharay, 2017). `"W_S"` for the Wald test statistic (Sinharay & Jensen, 2019). Options for response time-based statistics are: `"L_T"` for the signed likelihood ratio test statistic, or equivalently, `"W_T"` for the Wald test statistic (Sinharay, 2020). Options for score and response time-based statistics are: `"L_ST"` for the constrained likelihood ratio test statistic (Sinharay & Johnson, 2020).
`ci`	A vector of compromised item positions. All other items are presumed secure.
`psi`	A matrix of item parameters.
`xi`, `xi_c`, `xi_s`	Matrices of person parameters. `xi` is based on all items, `xi_c` is based on the compromised items, and `xi_s` is based on the secure items. If `NULL` (default), person parameters are estimated using maximum likelihood estimation.
`x`, `y`	Matrices of raw data. `x` is for the item scores and `y` the item log response times.
`interval`	The interval to search for the person parameters. Default is `c(-4, 4)`.
`alpha`	Value(s) between 0 and 1 indicating the significance level(s) used for flagging. Default is `0.05`.
`cutoff`	Use with the modified signed likelihood ratio test statistic and the Lugannani-Rice approximation. If the absolute value of the signed likelihood ratio test statistic is less than the cutoff (default is `0.05`), then the modified signed likelihood ratio test statistic is replaced with the signed likelihood ratio test statistic and the Lugannani-Rice approximation is replaced with the `p`-value of the signed likelihood ratio test statistic.

Value

A list is returned with the following elements:

`stat`	A matrix of preknowledge detection statistics.
`pval`	A matrix of p-values.
`flag`	An array of flagging results. The first dimension corresponds to persons, the second dimension to methods, and the third dimension to significance levels.

References

Sinharay, S. (2017). Detection of item preknowledge using likelihood ratio test and score test. Journal of Educational and Behavioral Statistics, 42(1), 46–68.

Sinharay, S. (2020). Detection of item preknowledge using response times. Applied Psychological Measurement, 44(5), 376–392.

Sinharay, S., & Jensen, J. L. (2019). Higher-order asymptotics and its application to testing the equality of the examinee ability over two sets of items. Psychometrika, 84(2), 484–510.

Sinharay, S., & Johnson, M. S. (2020). The use of item scores and response times to detect examinees who may have benefited from item preknowledge. British Journal of Mathematical and Statistical Psychology, 73(3), 397–419.

Examples

# Setup for Examples 1 and 2 ------------------------------------------------

# Settings
set.seed(0)     # seed for reproducibility
N <- 500        # number of persons
n <- 40         # number of items

# Randomly select 10% examinees with preknowledge and 40% compromised items
cv <- sample(1:N, size = N * 0.10)
ci <- sample(1:n, size = n * 0.40)

# Create vector of indicators (1 = preknowledge, 0 = no preknowledge)
ind <- ifelse(1:N %in% cv, 1, 0)

# Example 1: Item Scores and Response Times ---------------------------------

# Generate person parameters for the 2PL model and lognormal model
xi <- MASS::mvrnorm(
  N,
  mu = c(theta = 0.00, tau = 0.00),
  Sigma = matrix(c(1.00, 0.25, 0.25, 0.25), ncol = 2)
)

# Generate item parameters for the 2PL model and lognormal model
psi <- cbind(
  a = rlnorm(n, meanlog = 0.00, sdlog = 0.25),
  b = NA,
  c = 0,
  alpha = runif(n, min = 1.50, max = 2.50),
  beta = NA
)

# Generate positively correlated difficulty and time intensity parameters
psi[, c("b", "beta")] <- MASS::mvrnorm(
  n,
  mu = c(b = 0.00, beta = 3.50),
  Sigma = matrix(c(1.00, 0.20, 0.20, 0.15), ncol = 2)
)

# Simulate uncontaminated data
dat <- sim(psi, xi)
x <- dat$x
y <- dat$y

# Modify contaminated data by changing the item scores and reducing the log
# response times
x[cv, ci] <- rbinom(length(cv) * length(ci), size = 1, prob = 0.90)
y[cv, ci] <- y[cv, ci] * 0.75

# Detect preknowledge
out <- detect_pk(
  method = c("L_S", "ML_S", "LR_S", "S_S", "W_S", "L_T", "L_ST"),
  ci = ci,
  psi = psi,
  x = x,
  y = y
)

# Example 2: Polytomous Item Scores -----------------------------------------

# Generate person parameters for the generalized partial credit model
xi <- cbind(theta = rnorm(N, mean = 0.00, sd = 1.00))

# Generate item parameters for the generalized partial credit model
psi <- cbind(
  a = rlnorm(n, meanlog = 0.00, sdlog = 0.25),
  c0 = 0,
  c1 = rnorm(n, mean = -1.00, sd = 0.50),
  c2 = rnorm(n, mean = 0.00, sd = 0.50),
  c3 = rnorm(n, mean = 1.00, sd = 0.50)
)

# Simulate uncontaminated data
x <- sim(psi, xi)$x

# Modify contaminated data by changing the item scores to the maximum score
x[cv, ci] <- 3

# Detect preknowledge
out <- detect_pk(
  method = c("L_S", "ML_S", "LR_S", "S_S", "W_S"),
  ci = ci,
  psi = psi,
  x = x
)

[Package aberrance version 0.1.1 Index]