correct_typos {deductive} | R Documentation |
Correct typos in restricted numeric data
Description
Attempt to fix violations of linear (in)equality restrictions imposed on a record by replacing values with values that differ from the original values by typographical errors.
Usage
correct_typos(dat, x, ...)
## S4 method for signature 'data.frame,validator'
correct_typos(dat, x, fixate = NULL, eps = 1e-08, maxdist = 1, ...)
Arguments
dat |
An R object holding numeric (integer) data. |
x |
An R object holding linear data validation rules |
... |
Options to be passed to |
fixate |
|
eps |
|
maxdist |
|
Value
dat
, with values corrected.
Details
The algorithm works by proposing candidate replacement values and checking
whether they are likely to be the result of a typographical error. A value is
accepted as a solution when it resolves at least one equality violation. An
equality restriction a.x=b
is considered satisfied when
abs(a.x-b)<eps
. Setting eps
to one or two units of measurement
allows for robust typographical error detection in the presence of
roundoff-errors.
The algorithm is meant to be used on numeric data representing integers.
References
The first version of the algorithm was described by S. Scholtus (2009). Automatic correction of simple typing errors in numerical data with balance edits. Statistics Netherlands, Discussion Paper 09046
The generalized version of this algorithm that is implemented for this package is described in M. van der Loo, E. de Jonge and S. Scholtus (2011). Correction of rounding, typing and sign errors with the deducorrect package. Statistics Netherlands, Discussion Paper 2011019
Examples
library(validate)
# example from section 4 in Scholtus (2009)
v <-validate::validator(
x1 + x2 == x3
, x2 == x4
, x5 + x6 + x7 == x8
, x3 + x8 == x9
, x9 - x10 == x11
)
dat <- read.csv(textConnection(
"x1, x2 , x3 , x4 , x5 , x6, x7, x8 , x9 , x10 , x11
1452, 116, 1568, 116, 323, 76, 12, 411, 1979, 1842, 137
1452, 116, 1568, 161, 323, 76, 12, 411, 1979, 1842, 137
1452, 116, 1568, 161, 323, 76, 12, 411, 19979, 1842, 137
1452, 116, 1568, 161, 0, 0, 0, 411, 19979, 1842, 137
1452, 116, 1568, 161, 323, 76, 12, 0, 19979, 1842, 137"
))
cor <- correct_typos(dat,v)
dat - cor