similarity2 {CopyDetect} | R Documentation |

Computes the response similarity indices such as the Omega index (Wollack, 1996), Generalized Binomial Test ([GBT], van der Linden & Sotaridona (2006), K index (Holland, 1996), K1 and K2 indices (Sotaridona & Meijer, 2002), and S1 and S2 indices (Sotaridona & Meijer, 2003), and the M4 index (Maynes, 2014).

similarity2(data,resp.options,key, person.id, center.id=NULL, item.loc, single.pair=NULL, many.pairs=NULL, centers=NULL)

`data` |
a data frame with |

`resp.options` |
a vector of labels for the nominal response options in the dataset. |

`key` |
a vector of key response options for the items in the dataset. The order of key responses should be the same as the order of columns that stores item responses given in the item.loc argument. |

`person.id` |
column label in the dataset for the variable indicating unique person ids. |

`center.id` |
column label in the dataset for the variable indicating center ids. This is used if the user asks computing the indices for all pairs in some centers. For a single pair or multiple pair calculation, this argument is ignored. |

`item.loc` |
a numeric vector indicating the location of item response in the dataset. |

`single.pair` |
a character vector of length 2 for the suspected pair of examinees. The first element of the vector indicates the unique ID of the suspected copier examinee, and the second element of the vector indicates the unique ID of the suspected source examinee. |

`many.pairs` |
a matrix with two columns. Users can request to compute the indices simultaneously as many pairs as they wish. Each row of this matrix represents a pair of examinees. The first column of each row indicates the unique ID of the suspected copier examinee, and the second column of each row indicates the unique ID of the suspected source examinee. |

`centers` |
a character vector including unique center ids. Users can request to compute the indices for all pairs in the centers listed using this argument. |

`similarity2`

uses nominally scored items. Therefore, the definition of "identical incorrect response" and "identical correct response" is slightly different from `similarity1`

. For example, let A, B, C, and D be the response alternatives for items in a multiple-choice test, and let A be the key response for an item. There are 10 possible response combinations between two response vectors: (A,A), (A,B), (A,C), (A,D), (B,B), (B,C), (B,D), (C,C), (C,D), and (D,D). `similarity2`

counts the (A,A) response combination as an "identical correct response", and any of the (B,B), (C,C), and (D,D) response combinations as an "identical incorrect response". Similar to `similarity1`

, the (NA,NA) response combination is counted as an "identical incorrect response". All other response combinations (A,B), (A,C), (A,D), (B,C), (B,D), (C,D), (A,NA), (B,NA), (C,NA), and (D,NA) are counted as non-identical responses. When computing the number-correct/number-incorrect scores or estimating the IRT ability parameters, missing values (NA) in a response vector are counted as an incorrect response.

If a single-pair is requested, `similarity1()`

returns a list containing the following components.

`data` |
original data file provided by user |

`scored.data` |
scored item response matrix based on the key response vector provided |

`W.index` |
statistics for the W index |

`GBT.index` |
statistics for the GBT index |

`K.index` |
statistics for the K index |

`K.variants` |
statistics for the K1, K2, S1, and S2 indices |

`M4.index` |
statistics for the M4 index |

`item.loc` |
columns in the dataset that stores the item responses |

`item.par` |
estimated item parameter matrix for the Bock's Nominal Response IRT Model |

`center.id` |
column label for the unique center IDs |

`person.id` |
column label for the unique person IDs |

`single.pair` |
Unique IDs for a pair of examinee requested |

`single.pair2` |
corresponding row numbers for the pair of examinee requested |

`thetas` |
maximum likelihood estimates for IRT ability parameter for each individual |

If a multiple pairs are requested in the matrix form, `similarity1()`

returns a list containing the following components.

`data` |
original data file provided by user |

`output.manypairs` |
a matrix including the IDs for each pair requested and corresponding indices computed for these pairs |

`item.loc` |
columns in the dataset that stores the item responses |

`item.par` |
estimated item parameter matrix for a chosen IRT model |

`center.id` |
column label for the unique center IDs |

`person.id` |
column label for the unique person IDs |

`thetas` |
maximum likelihood estimates for IRT ability parameter for each individual |

If all possible pairs in certain test centers are requested in the matrix form, `similarity1()`

returns a list containing the following components.

`data` |
original data file provided by user |

`output.centers` |
a matrix including the IDs for all pairs requested and corresponding indices computed for these pairs |

`item.loc` |
columns in the dataset that stores the item responses |

`item.par` |
estimated item parameter matrix for a chosen IRT model |

`center.id` |
column label for the unique center IDs |

`person.id` |
column label for the unique person IDs |

`thetas` |
maximum likelihood estimates for IRT ability parameter for each individual |

Cengiz Zopluoglu

Sotaridona, L.S., & Meijer, R.R.(2002). Statistical properties of the K-index for detecting answer copying. *Journal of Educational Measurement, 39*, 115-132.

Sotaridona, L.S., & Meijer, R.R.(2003). Two new statistics to detect answer copying. *Journal of Educational Measurement, 40*, 53-69.

van der Linden, W.J., & Sotaridona, L.S.(2006). Detecting answer copying when the regular response process follows a known response model. *Journal of Educational and Behavioral Statistics, 31*, 283-304.

Wollack, J.A.(1996). Detection of answer copying using item response theory. *Dissertation Abstracts International, 57/05*, 2015.

Wollack, J.A.(2003). Comparison of answer copying indices with real data. *Journal of Educational Measurement, 40*, 189-205.

Wollack, J.A.(2006). Simultaneous use of multiple answer copying indexes to improve detection rates. *Applied Measurement in Education, 19*, 265-288.

Wollack, J.A., & Cohen, A.S.(1998). Detection of answer copying with unknown item and trait parameters. *Applied Psychological Measurement, 22*, 144-152.

data(form2) dim(form2) head(form2) # the first column of this dataset is unique individual IDs # the second column of this dataset is unique center IDs # From Column 3 to Column 172, nominal item responses. 1, 2, 3, and 4 represent different # nominal response options. # For the sake of reducing the computational time, # I will analyze a subset of this dataset (first 10 items) subset <- form2[1:500,1:12] dim(subset) head(subset) # Computing similarity for a single pair key.resp <- c(2,3,1,4,1,2,2,1,1,1) a <- similarity2(data = subset, resp.options = c(1,2,3,4), key = key.resp, person.id = "EID", item.loc = 3:12, single.pair = c("e200287","e200169")) print(a) # Computing for multiple pairs pairs <- matrix(as.character(sample(subset$EID,20)),nrow=10,ncol=2) a <- similarity2(data = subset, resp.options = c(1,2,3,4), key = key.resp, person.id = "EID", item.loc = 3:12, many.pairs = pairs) print(a) # Computing all possible pairs in the requested centers a <- similarity2(data = subset, resp.options = c(1,2,3,4), key = key.resp, person.id = "EID", center.id = "cent_id", item.loc = 3:12, centers = c(42,45,4114)) print(a) # Key response vector for all 170 items for future reference key.resp <- c(2,3,1,4,1,2,2,1,1,1,4,1,3,1,3,3,1,2,1,3,3,4,1, 3,3,2,3,2,2,3,1,4,1,2,3,3,2,3,4,1,2,1,1,4,3,3, 1,1,4,2,2,1,4,1,2,3,3,1,2,4,1,4,2,4,1,1,2,3,4, 4,1,4,2,1,2,2,2,2,4,4,3,2,1,3,2,3,2,2,1,2,4,3, 2,1,2,1,2,3,1,1,4,3,4,3,4,3,1,3,3,4,2,1,1,4,3, 2,4,4,1,1,1,2,2,1,3,1,2,3,3,3,4,4,1,4,4,3,4,2, 3,1,4,1,4,1,3,2,2,4,4,4,1,2,2,3,4,1,2,1,4,4,4, 1,3,1,2,1,2,3,2,2)

[Package *CopyDetect* version 1.3 Index]