R: Detect answer similarity

detect_as {aberrance}

R Documentation

Detect answer similarity

Description

Detect answer similarity for all possible pairs.

Usage

detect_as(
  method,
  psi,
  xi = NULL,
  x = NULL,
  d = NULL,
  r = NULL,
  y = NULL,
  interval = c(-4, 4),
  alpha = 0.05
)

Arguments

`method`	The answer similarity statistic(s) to compute. Options for score-based statistics are: `"OMG_S"` for the unconditional `\omega` statistic (Romero et al., 2015). `"GBT_S"` for the unconditional `GBT` statistic (van der Linden & Sotaridona, 2006). `"M4_S"` for the `M4` statistic (Maynes, 2014). Options for score and distractor-based statistics are: `"OMG_SD"` for the unconditional `\omega` statistic (Romero et al., 2015). `"GBT_SD"` for the unconditional `GBT` statistic (van der Linden & Sotaridona, 2006). `"M4_SD"` for the `M4` statistic (Maynes, 2014). Options for response-based statistics are: `"OMG_R"` for the unconditional `\omega` statistic (Romero et al., 2015). `"GBT_R"` for the unconditional `GBT` statistic (van der Linden & Sotaridona, 2006). `"M4_R"` for the `M4` statistic (Maynes, 2014). Options for score and response time-based statistics are: `"OMG_ST"` for the unconditional `\omega` statistic (Gorney & Wollack, 2024). `"GBT_ST"` for the unconditional `GBT` statistic (Gorney & Wollack, 2024). Options for score, distractor, and response time-based statistics are: `"OMG_SDT"` for the unconditional `\omega` statistic (Gorney & Wollack, 2024). `"GBT_SDT"` for the unconditional `GBT` statistic (Gorney & Wollack, 2024). Options for response and response time-based statistics are: `"OMG_RT"` for the unconditional `\omega` statistic (Gorney & Wollack, 2024). `"GBT_RT"` for the unconditional `GBT` statistic (Gorney & Wollack, 2024).
`psi`	A matrix of item parameters.
`xi`	A matrix of person parameters. If `NULL` (default), person parameters are estimated using maximum likelihood estimation.
`x`, `d`, `r`, `y`	Matrices of raw data. `x` is for the item scores, `d` the item distractors, `r` the item responses, and `y` the item log response times.
`interval`	The interval to search for the person parameters. Default is `c(-4, 4)`.
`alpha`	Value(s) between 0 and 1 indicating the significance level(s) used for flagging. Default is `0.05`.

Value

A list is returned with the following elements:

`stat`	A matrix of answer similarity statistics.
`pval`	A matrix of p-values.
`flag`	An array of flagging results. The first dimension corresponds to pairs, the second dimension to methods, and the third dimension to significance levels.

References

Gorney, K., & Wollack, J. A. (2024). Using response times in answer similarity analysis. Journal of Educational and Behavioral Statistics. Advance online publication.

Maynes, D. (2014). Detection of non-independent test taking by similarity analysis. In N. M. Kingston & A. K. Clark (Eds.), Test fraud: Statistical detection and methodology (pp. 53–80). Routledge.

Romero, M., Riascos, Á., & Jara, D. (2015). On the optimality of answer-copying indices: Theory and practice. Journal of Educational and Behavioral Statistics, 40(5), 435–453.

van der Linden, W. J., & Sotaridona, L. (2006). Detecting answer copying when the regular response process follows a known response model. Journal of Educational and Behavioral Statistics, 31(3), 283–304.

Examples

# Setup for Examples 1 and 2 ------------------------------------------------

# Settings
set.seed(0)     # seed for reproducibility
N <- 50         # number of persons
n <- 40         # number of items

# Randomly select 10% examinees with preknowledge and 40% compromised items
cv <- sample(1:N, size = N * 0.10)
ci <- sample(1:n, size = n * 0.40)

# Create vector of indicators (1 = similar pair, 0 = non-similar pair)
pair <- t(combn(N, 2))
ind <- ifelse((pair[, 1] %in% cv) & (pair[, 2] %in% cv), 1, 0)
names(ind) <- paste(pair[, 1], pair[, 2], sep = "-")

# Example 1: Item Scores and Response Times ---------------------------------

# Generate person parameters for the 3PL model and lognormal model
xi <- MASS::mvrnorm(
  N,
  mu = c(theta = 0.00, tau = 0.00),
  Sigma = matrix(c(1.00, 0.25, 0.25, 0.25), ncol = 2)
)

# Generate item parameters for the 3PL model and lognormal model
psi <- cbind(
  a = rlnorm(n, meanlog = 0.00, sdlog = 0.25),
  b = NA,
  c = runif(n, min = 0.05, max = 0.30),
  alpha = runif(n, min = 1.50, max = 2.50),
  beta = NA
)

# Generate positively correlated difficulty and time intensity parameters
psi[, c("b", "beta")] <- MASS::mvrnorm(
  n,
  mu = c(b = 0.00, beta = 3.50),
  Sigma = matrix(c(1.00, 0.20, 0.20, 0.15), ncol = 2)
)

# Simulate uncontaminated data
dat <- sim(psi, xi)
x <- dat$x
y <- dat$y

# Modify contaminated data by changing the item scores and reducing the log
# response times
x[cv, ci] <- rbinom(length(cv) * length(ci), size = 1, prob = 0.90)
y[cv, ci] <- y[cv, ci] * 0.75

# Detect answer similarity
out <- detect_as(
  method = c("OMG_S", "GBT_S", "OMG_ST", "GBT_ST"),
  psi = psi,
  x = x,
  y = y
)

# Example 2: Polytomous Item Scores -----------------------------------------

# Generate person parameters for the generalized partial credit model
xi <- cbind(theta = rnorm(N, mean = 0.00, sd = 1.00))

# Generate item parameters for the generalized partial credit model
psi <- cbind(
  a = rlnorm(n, meanlog = 0.00, sdlog = 0.25),
  c0 = 0,
  c1 = rnorm(n, mean = -1.00, sd = 0.50),
  c2 = rnorm(n, mean = 0.00, sd = 0.50),
  c3 = rnorm(n, mean = 1.00, sd = 0.50)
)

# Simulate uncontaminated data
x <- sim(psi, xi)$x

# Modify contaminated data by changing the item scores to the maximum score
x[cv, ci] <- 3

# Detect answer similarity
out <- detect_as(
  method = c("OMG_S", "GBT_S"),
  psi = psi,
  x = x
)

# Setup for Examples 3 and 4 ------------------------------------------------

# Settings
set.seed(0)     # seed for reproducibility
N <- 50         # number of persons
n <- 40         # number of items

# Randomly select 10% sources and 10% copiers
s <- sample(1:N, size = N * 0.10)
c <- sample(setdiff(1:N, s), size = N * 0.10)

# Create vector of indicators (1 = similar pair, 0 = non-similar pair)
pair <- t(combn(N, 2))
ind <- ifelse(1:nrow(pair) %in% apply(
  rbind(cbind(s, c), cbind(c, s)), 1, function(p)
  which(pair[, 1] == p[1] & pair[, 2] == p[2])), 1, 0)
names(ind) <- paste(pair[, 1], pair[, 2], sep = "-")

# Example 3: Item Scores and Distractors ------------------------------------

# Generate person parameters for the nested logit model
xi <- MASS::mvrnorm(
  N,
  mu = c(theta = 0.00, eta = 0.00),
  Sigma = matrix(c(1.00, 0.80, 0.80, 1.00), ncol = 2)
)

# Generate item parameters for the nested logit model
psi <- cbind(
  a = rlnorm(n, meanlog = 0.00, sdlog = 0.25),
  b = rnorm(n, mean = 0.00, sd = 1.00),
  c = runif(n, min = 0.05, max = 0.30),
  lambda1 = rnorm(n, mean = 0.00, sd = 1.00),
  lambda2 = rnorm(n, mean = 0.00, sd = 1.00),
  lambda3 = rnorm(n, mean = 0.00, sd = 1.00),
  zeta1 = rnorm(n, mean = 0.00, sd = 1.00),
  zeta2 = rnorm(n, mean = 0.00, sd = 1.00),
  zeta3 = rnorm(n, mean = 0.00, sd = 1.00)
)

# Simulate uncontaminated data
dat <- sim(psi, xi)
x <- dat$x
d <- dat$d

# Modify contaminated data by replacing 40% of the copier scores and
# distractors with source scores and distractors
for (v in 1:length(c)) {
  ci <- sample(1:n, size = n * 0.40)
  x[c[v], ci] <- x[s[v], ci]
  d[c[v], ci] <- d[s[v], ci]
}

# Detect answer similarity
out <- detect_as(
  method = c("OMG_S", "GBT_S", "OMG_SD", "GBT_SD"),
  psi = psi,
  x = x,
  d = d
)

# Example 4: Item Responses -------------------------------------------------

# Generate person parameters for the nominal response model
xi <- cbind(eta = rnorm(N, mean = 0.00, sd = 1.00))

# Generate item parameters for the nominal response model
psi <- cbind(
  lambda1 = rnorm(n, mean = -0.50, sd = 0.50),
  lambda2 = rnorm(n, mean = -0.50, sd = 0.50),
  lambda3 = rnorm(n, mean = -0.50, sd = 0.50),
  lambda4 = rnorm(n, mean = 1.50, sd = 0.50),
  zeta1 = rnorm(n, mean = -0.50, sd = 0.50),
  zeta2 = rnorm(n, mean = -0.50, sd = 0.50),
  zeta3 = rnorm(n, mean = -0.50, sd = 0.50),
  zeta4 = rnorm(n, mean = 1.50, sd = 0.50)
)

# Simulate uncontaminated data
r <- sim(psi, xi)$r

# Modify contaminated data by replacing 40% of the copier responses with
# source responses
for (v in 1:length(c)) {
  ci <- sample(1:n, size = n * 0.40)
  r[c[v], ci] <- r[s[v], ci]
}

# Detect answer similarity
out <- detect_as(
  method = c("OMG_R", "GBT_R"),
  psi = psi,
  r = r
)

[Package aberrance version 0.1.1 Index]