R: Correct sign errors and value interchanges in data records

correctSigns {deducorrect}

R Documentation

Correct sign errors and value interchanges in data records

Description

Correct sign errors and value interchanges in data records.

Usage

correctSigns(E, dat, ...)

## S3 method for class 'editset'
correctSigns(E, dat, ...)

## S3 method for class 'editmatrix'
correctSigns(E, dat, flip = getVars(E), swap = list(),
  maxActions = length(flip) + length(swap), maxCombinations = 1e+05,
  eps = sqrt(.Machine$double.eps), weight = rep(1, length(flip) +
  length(swap)), fixate = NA, ...)

Arguments

`E`	An object of class `editmatrix`
`dat`	`data.frame`, the records to correct.
`...`	arguments to be passed to other methods.
`flip`	A `character` vector of variable names who's values may be sign-flipped
`swap`	A `list` of `character` 2-vectors of variable combinations who's values may be swapped
`maxActions`	The maximum number of flips and swaps that may be performed
`maxCombinations`	The number of possible flip/swap combinations in each step of the algorithm is `choose(n,k)`, with `n` the number of `flips+swaps`, and `k` the number of actions taken in that step. If `choose(n,k)` exceeds `maxCombinations`, the algorithm returns a record uncorrected.
`eps`	Tolerance to check equalities against. Use this to account for sign errors masked by rounding errors.
`weight`	weight vector. Weights can be assigned either to actions (flips and swap) or to variables. If `length(weight)==length(flip)+length(swap)`, weights are assiged to actions, if `length(weight)==ncol(E)`, weights are assigned to variables. In the first case, the first `length{flip}` weights correspond to flips, the rest to swaps. A warning is issued in the second case when the weight vector is not named. See the examples for more details.
`fixate`	a `character` vector with names of variables whos values may not be changed

Details

This algorithm tries to correct records violating linear equalities by sign flipping and/or value interchanges. Linear inequalities are taken into account when judging possible solutions. If one or more inequality restriction is violated, the solution is rejected. It is important to note that the status of a record has the following meaning:

`valid`	The record obeys all equality constraints on entry. No error correction is performed.
	It may therefore still contain inequality errors.
`corrected`	Equality errors were found, and all of them are solved without violating inequalities.
`partial`	Does not occur
`invalid`	The record contains equality violations which could not be solved with this algorithm
`NA`	record could not be checked. It contained missings.

The algorithm applies all combinations of (user-allowed) flip- and swap combinations to find a solution, and minimizes the number of actions (flips+swaps) that have to be taken to correct a record. When multiple solutions are found, the solution of minimal weight is chosen. The user may provide a weight vector with weights for every flip and every swap, or a named weight vector with a weight for every variable. If the weights do not single out a solution, the first one found is chosen.

If arguments flip or swap contain a variable not in E, these variables will be ignored by the algorithm.

Value

a deducorrect-object. The status slot has the following columns for every records in dat.

`status`	a `status` factor, showing the status of the treated record.
`degeneracy`	the number of solutions found, after applying the weight
`weight`	the weight of the chosen solution
`nflip`	the number of applied sign flips
`nswap`	the number of applied value interchanges

References

Scholtus S (2008). Algorithms for correcting some obvious inconsistencies and rounding errors in business survey data. Technical Report 08015, Netherlands.

Examples


# some data 
dat <- data.frame(
    x = c( 3,14,15,  1, 17,12.3),
    y = c(13,-4, 5,  2,  7, -2.1),
    z = c(10,10,-10, NA,10,10 ))
# ... which has to obey
E <- editmatrix(c("z == x-y"))

# All signs may be flipped, no swaps.

correctSigns(E, dat)

# Allow for rounding errors
correctSigns(E, dat, eps=2)

# Limit the number of combinations that may be tested 
correctSigns(E, dat, maxCombinations=2)

# fix z, flip everything else
correctSigns(E, dat,fixate="z")

# the same result is achieved with
correctSigns(E, dat, flip=c("x","y"))

# make x and y swappable, allow no flips
correctSigns(E, dat, flip=c(), swap=list(c("x","y")))

# make x and y swappable, swap a counts as one flip
correctSigns(E, dat, flip="z", swap=list(c("x","y")))

# same, but now, swapping is preferred (has lower weight)
correctSigns(E, dat, flip="z", swap=list(c("x","y")), weight=c(2,1))

# same, but now becayse x any y carry lower weight. Also allow for rounding errors
correctSigns(E, dat, flip="z", swap=list(c("x","y")), eps=2, weight=c(x=1, y=1, z=3))

# demand that solution has y>0
E <- editmatrix(c("z==x-y", "y>0"))
correctSigns(E,dat)

# demand that solution has y>0, taking acount of roundings in equalities
correctSigns(E,dat,eps=2)

# example with editset
E <- editset(expression(
    x + y == z,
    x >= 0,
    y > 0,
    y < 2,
    z > 1,
    z < 3,
    A %in% c('a','b'),
    B %in% c('c','d'),
    if ( A == 'a' ) B == 'b',
    if ( B == 'b' ) x < 1
))

x <- data.frame(
    x = -1,
    y = 1,
    z = 2,
    A = 'a',
    B = 'b'
)

correctSigns(E,x)

[Package deducorrect version 1.3.7 Index]