errorLocalizer {editrules} | R Documentation |
Create a backtracker object for error localization
Description
Create a backtracker object for error localization
Usage
errorLocalizer(E, x, ...)
## S3 method for class 'editset'
errorLocalizer(E, x, ...)
## S3 method for class 'editmatrix'
errorLocalizer(
E,
x,
weight = rep(1, length(x)),
maxadapt = length(x),
maxweight = sum(weight),
maxduration = 600,
tol = sqrt(.Machine$double.eps),
...
)
## S3 method for class 'editarray'
errorLocalizer(
E,
x,
weight = rep(1, length(x)),
maxadapt = length(x),
maxweight = sum(weight),
maxduration = 600,
...
)
## S3 method for class 'editlist'
errorLocalizer(
E,
x,
weight = rep(1, length(x)),
maxadapt = length(x),
maxweight = sum(weight),
maxduration = 600,
...
)
Arguments
E |
an |
x |
a named numerical |
... |
Arguments to be passed to other methods (e.g. reliability weights) |
weight |
a |
maxadapt |
maximum number of variables to adapt |
maxweight |
maximum weight of solution, if weights are not given, this is equal to the maximum number of variables to adapt. |
maxduration |
maximum time (in seconds), for |
tol |
tolerance passed to |
Value
an object of class backtracker
. Each execution of $searchNext()
yields a solution
in the form of a list
(see details). Executing $searchBest()
returns the lowest-weight solution.
When multiple solotions with the same weight are found, $searchBest()
picks one at random.
Details
Generate a backtracker
object for error localization in numerical, categorical, or mixed data.
This function generates the workhorse program, called by localizeErrors
with method=localizer
.
The returned backtracker
can be used to run a branch-and-bound algorithm which finds
the least (weighted) number of variables in x
that need to be adapted so that all restrictions
in E
can be satisfied. (Generalized principle of Fellegi and Holt (1976)).
The B&B tree is set up so that in in one branche,
a variable is assumed correct and its value subsituted in E
, while in the other
branche a variable is assumed incorrect and eliminated
from E
.
See De Waal (2003), chapter 8 or De Waal, Pannekoek and Scholtus (2011) for
a concise description of the B&B algorithm.
Every call to <backtracker>$searchNext()
returns one solution list
, consisting of
w: The solution weight.
adapt:
logical
indicating whether a variable should be adapted (TRUE
) or not
Every subsequent call leads either to NULL
, in which case either all solutions have been found,
or maxduration
was exceeded. The property <backtracker>$maxdurationExceeded
indicates if this is
the case. Otherwise, a new solution with a weight w
not higher than the weight of the last found solution
is returned.
Alternatively <backtracker>$searchBest()
will return the best solution found within maxduration
seconds.
If multiple equivalent solutions are found, a random one is returned.
The backtracker is prepared such that missing data in the input record x
is already
set to adapt, and missing variables have been eliminated already.
The backtracker will crash when E
is an editarray
and one or more values are
not in the datamodel specified by E
. The more user-friendly function localizeErrors
circumvents this. See also checkDatamodel
.
Numerical stability issues
For records with a large numerical range (eg 1-1E9), the error locations represent solutions that
will allow repairing the record to within roundoff errors. We highly recommend that you round near-zero
values (for example, everything <= sqrt(.Machine$double.eps)
) and scale a record with values larger
than or equal to 1E9 with a constant factor.
Note
This method is potentially very slow for objects of class editset
that contain
many conditional restrictions. Consider using localizeErrors
with the option
method="mip"
in such cases.
References
I.P. Fellegi and D. Holt (1976). A systematic approach to automatic edit and imputation. Journal of the American Statistical Association 71, pp 17-25
T. De Waal (2003) Processing of unsave and erroneous data. PhD thesis, Erasmus Research institute of management, Erasmus university Rotterdam. http://www.cbs.nl/nl-NL/menu/methoden/onderzoek-methoden/onderzoeksrapporten/proefschriften/2008-proefschrift-de-waal.htm
T. De Waal, Pannekoek, J. and Scholtus, S. (2011) Handbook of Statistical Data Editing. Wiley Handbooks on Survey Methodology.
See Also
errorLocalizer_mip
, localizeErrors
, checkDatamodel
, violatedEdits
,
Examples
#### examples with numerical edits
# example with a single editrule
# p = profit, c = cost, t = turnover
E <- editmatrix(c("p + c == t"))
cp <- errorLocalizer(E, x=c(p=755, c=125, t=200))
# x obviously violates E. With all weights equal, changing any variable will do.
# first solution:
cp$searchNext()
# second solution:
cp$searchNext()
# third solution:
cp$searchNext()
# there are no more solution since changing more variables would increase the
# weight, so the result of the next statement is NULL:
cp$searchNext()
# Increasing the reliability weight of turnover, yields 2 solutions:
cp <- errorLocalizer(E, x=c(p=755, c=125, t=200), weight=c(1,1,2))
# first solution:
cp$searchNext()
# second solution:
cp$searchNext()
# no more solutions available:
cp$searchNext()
# A case with two restrictions. The second restriction demands that
# c/t >= 0.6 (cost should be more than 60% of turnover)
E <- editmatrix(c(
"p + c == t",
"c - 0.6*t >= 0"))
cp <- errorLocalizer(E,x=c(p=755,c=125,t=200))
# Now, there's only one solution, but we need two runs to find it (the 1st one
# has higher weight)
cp$searchNext()
cp$searchNext()
# With the searchBest() function, the lowest weifght solution is found at once:
errorLocalizer(E,x=c(p=755,c=125,t=200))$searchBest()
# An example with missing data.
E <- editmatrix(c(
"p + c1 + c2 == t",
"c1 - 0.3*t >= 0",
"p > 0",
"c1 > 0",
"c2 > 0",
"t > 0"))
cp <- errorLocalizer(E,x=c(p=755, c1=50, c2=NA,t=200))
# (Note that e2 is violated.)
# There are two solutions. Both demand that c2 is adapted:
cp$searchNext()
cp$searchNext()
##### Examples with categorical edits
#
# 3 variables, recording age class, position in household, and marital status:
# We define the datamodel and the rules
E <- editarray(expression(
age %in% c('under aged','adult'),
maritalStatus %in% c('unmarried','married','widowed','divorced'),
positionInHousehold %in% c('marriage partner', 'child', 'other'),
if( age == 'under aged' )
maritalStatus == 'unmarried',
if( maritalStatus %in% c('married','widowed','divorced'))
!positionInHousehold %in% c('marriage partner','child')
)
)
E
# Let's define a record with an obvious error:
r <- c(
age = 'under aged',
maritalStatus='married',
positionInHousehold='child')
# The age class and position in household are consistent, while the marital
# status conflicts. Therefore, changing only the marital status (in stead of
# both age class and postition in household) seems reasonable.
el <- errorLocalizer(E,r)
el$searchNext()