R: Comparison of data sets

compare {klausuR}

R Documentation

Comparison of data sets

Description

The function compare will take two data.frames (or objects of class klausuR.answ-class) and compare them for equality. This is useful to check for typos before you calculate the results with klausur. If you need to type in the given answers by hand, errors easily occur, so it is advisable to input all data at least twice (perhaps by different persons) and check for differences with this function, which can then be corrected by looking up the original answer in the test.

Usage

compare(
  set1,
  set2,
  select = NULL,
  ignore = NULL,
  new.set = FALSE,
  rename = c(),
  trim = FALSE,
  id = list(No = "No", Name = c("FirstName", "Name"))
)

Arguments

`set1`, `set2`	The data sets to be compared. Can be two data.frames or objects of class `klausuR.answ-class`. If the latter, their slots `id` and `items` will be compared.
`select`	A vector with variables that should be compared, all others are omitted. At least all the values given in `id` are needed for the output! If `NULL`, all variables are examined.
`ignore`	A vector with variables that should be dropped from both sets. See also `select`.
`new.set`	Logical. If `TRUE`, a data.frame of the compared sets is returned, with all unequal cells set to NA.
`rename`	A named vector defining if variables in `set1` and `set2` need to be renamed into the klausuR name scheme. Accepts elements named `No`, `Name`, `FirstName`, `MatrNo`, `Pseudonym` and `Form`. The values of these elements represent the variable names of the input data.
`trim`	Logical. Indicates wheter whitespace in character variables should be trimmed.
`id`	A named list of character vectors to help identify differing cases in the input data. The element names of this list will become column names in the generated output table, their values define the respective column names of the input data. If a value has more than one element, they will be collapsed into one string for the output.

Details

If you don't want to compare all variables but only a subset, you can use the select option (see examples below). But be careful with this, at least all the values given in id are needed to produce the output table.

If new.set=TRUE, a new data.frame will be returned, that is identical in both sets compared, but all dubious values will be replaced by NA.

Value

If new.set=FALSE, a data.frame of the differences, if found (if not, just a message is returned). Otherwise returns a combined data.frame (see details).

Author(s)

m.eik michalke meik.michalke@uni-duesseldorf.de

Examples

## Not run: 
data(antworten)

# create some differences
antworten2 <- antworten[-3, -7]
antworten2[4,6] <- NA
antworten2[8,8:10] <- antworten2[8,8:10] + 1

# default comparison
compare(antworten, antworten2)

# compare only variables 1 to 12
compare(antworten, antworten2, select=c(1:12))

# omit variables 3 to 8 and create a new set called "antworten.comp"
# from the results
antworten.comp <- compare(antworten, antworten2, select=-c(3:8), new.set=TRUE)

## End(Not run)

[Package klausuR version 0.12-14 Index]