R: WhichAnswerOriginal

WhichAnswerOriginal {TexExamRandomizer}

R Documentation

WhichAnswerOriginal

Description

Given the answers of the students gathered in a table, and a full answer sheet of all versions (Including a "reference/original" version), it finds where those answers are found in the original exam, by copying from the original version the matching rows and binding them in order for every student. It then combines all of them in a list, and includes as well all the remaining student information in the attribute "StudentInfo".

It is intended as an internal function to generate the grades, and to identify in a very general way where the answers of the students are (relative to the reference/original version).

Usage

WhichAnswerOriginal(
  StudentAnswers,
  FullExamAnswerSheet,
  OriginalExamVersion = 0,
  names.FullExamVersion = "Version",
  names.FullExamOriginalCols,
  names.CorrectAndIncorrectCols,
  names.StudentAnswerQCols,
  names.StudentAnswerExamVersion
)

Arguments

`StudentAnswers`	DataFrame, each row is a student, each column is some information about said student. Any column not included in `names.StudentAnswerQCols` will be understood as information of the student and will be saved as part of the information table when we output the result.
`FullExamAnswerSheet`	Answer sheet of all the exam versions, following the conventions of the `FullAnswerSheet` outputted by `CreateRandomExams`
`OriginalExamVersion`	The version of the original exam, without randomization, as stored on the `FullExamAnswerSheet`. The default value is `0`, as that is the convention on `CreateRandomExams`
`names.FullExamVersion`	The name of the column in which the version of the exam is stored on the `FullExamAnswerSheet`. The default value is "`Version`", as that is the convention on `CreateRandomExams`
`names.FullExamOriginalCols`	The names of the columns that contain the information of the items relative to where they were positioned in the original ordering of the exam, before randomizing the exam. The convention from `CreateRandomExams` is to finish all of them by "`_original`".
`names.CorrectAndIncorrectCols`	It should be a character vector. The names of the columns in the `FullExamAnswerSheet` that contain the correct and incorrect answers, in that order. This column should have an integer value if it is indeed a correct value in the correct column and a incorrect value in the incorrect value, and `NA` otherwise. (The should be "complementary")
`names.StudentAnswerQCols`	The names in the `StudentAnswers` that store the answers from every student to the exam, ordered. These columns should contain integers values. Where 1 refers to the first answer, and n refers to the nth answers in their exam.
`names.StudentAnswerExamVersion`	The name of the column in the `StudentAnswers` that identifies the version of the exam

Details

The StudentAnswers should be a data frame with one student answers represented by every row. The answers of the student to the exam should be ordered.

It is important that the colums named names.StudentAnswerQCols should contain all their answers, if a student didn't answer a question leave a NA or an invalid integer value as an answer, like 0, or a number larger than the number of answers to that question, so that is is found as out of bounds.

Value

It returns a list. Each element of the list is a dataframe, and there is one dataframe for each student in the StudentInfo table provided.

All the columns that are not in the columns names.StudentAnswerQCols are regarded as "StudentInfo", and they are added to the attribute "StudentInfo" of the output as a data frame.

List elements:

They are outputted in order, that is to say, for StudentAnswers[i,] the list that provides the information for that row will be outputlist[[i]].

outputlist[[i]] is a dataframe that identifies the rows that the student answered as they are found on the original/reference version. Therefore, if a student answeres a certain value, and that value is not reflected on the original version, it get's ignored.

StudentInfo attribute

A dataframe containing all the student information that wasn't their answers.

Underlying algorithm

To identify the rows on the original exam it does the following:

It first finds their exam in the full answer sheet by their exam version.
After that, it removes from their exam the rows that identify the correct/incorrect choices.
By trying to match that row with a row on the reference exam it can tell where that quesiton is found on the original exam.
Then it identifies where that question is found on the original version, and it finds there which of the possible correct/incorrect choices is found.
If it didn't find any correct/incorrect choice matching the value given by the student, it marks it as out of bounds and replaces both correct and incorrect columns with NA.
If it still doesn't find the row, it simply ignores it, and the output will have one less row.
Now you can tell how many questions the student answered correctly by looking at how many values are not NA in the correct choice column of the output list.

Removing Questions from the exam

Note that if after creating the exam, you found that a question is bugged and can't be used to grade the exam, all you have to do is tell the student to answer "something" and you only have to remove it from the original/reference version in the Full Answer Sheet. When you apply the grading function, that question will then be ignored.

Notice how this creates output lists with different lengths in the case that two students didn't have that same question in their exam.

For example, if a exam has 15 questions out of a 50 question document. If student A has a bugged question and student B doesn't, the answer sheet produced for student A will have 14 rows while the one for student B will have 15 rows.

Notes

Note1: Remember that in the original answer sheet there are two columns, one with correctchoice, another one with wrong choice. If the value is NA of one of those two columns it SHOULD NOT be NA on the other row.

Note2: The idea is that the data frames can be read to know the score of the student by counting the number of values that are not NAs on the correct choice column. (The numbers on the correct/incorrect columns themselves can be used for statistical purposes, to tell how many students answered each question).

Note3: The data frames can be used for many other statistical purposes very easily.

Examples



asheet_file <-
    system.file(
        "extdata",
        "ExampleTables",
        "ExampleAnswerSheet.csv",
        package = "TexExamRandomizer")
responses_file <-
    system.file(
        "extdata",
        "ExampleTables",
        "ExampleResponses.csv",
        package = "TexExamRandomizer")
FullAnswerSheet <-
    read.csv(
        asheet_file,
        header = TRUE,
        stringsAsFactors = FALSE,
        na.strings = c("", "NA", "Na"),
        strip.white = TRUE)
Responses <- read.csv(
    responses_file,
    header = TRUE,
    stringsAsFactors = FALSE,
    na.strings = c("", "NA", "Na"),
    strip.white = TRUE)
compiledanswers <-
    WhichAnswerOriginal(
        StudentAnswers = Responses,
        FullExamAnswerSheet = FullAnswerSheet,
        names.StudentAnswerQCols = grep(
            names(Responses),
            pattern = "^Q.*[[:digit:]]",
            value = TRUE),
        names.StudentAnswerExamVersion = grep(
            names(Responses),
            pattern = "Version",
            value = TRUE),
        OriginalExamVersion = 0,
        names.FullExamVersion = "Version",
        names.FullExamOriginalCols = grep(
            names(FullAnswerSheet),
            pattern = "_original",
             value = TRUE),
        names.CorrectAndIncorrectCols = c(
            "choice",
            "CorrectChoice")
    )
nicknames <- attr(compiledanswers, "StudentInfo")$Nickname

for (i in 1:length(compiledanswers)) {
    cat("Student\t", nicknames[i], " got\t",
        sum(!is.na(compiledanswers[[i]]$CorrectChoice)),
        " questions correctly\n", sep = "")
}

[Package TexExamRandomizer version 1.2.7 Index]