replace_errors {errorlocate} | R Documentation |
Replace erroneous fields with NA or a suggested value
Description
Find erroneous fields using locate_errors()
and replace these
fields automatically with NA or a suggestion that is provided by the error detection algorithm.
Usage
replace_errors(
data,
x,
ref = NULL,
...,
cl = NULL,
Ncpus = getOption("Ncpus", 1),
value = c("NA", "suggestion")
)
## S4 method for signature 'data.frame,validator'
replace_errors(
data,
x,
ref = NULL,
...,
cl = NULL,
Ncpus = getOption("Ncpus", 1),
value = c("NA", "suggestion")
)
## S4 method for signature 'data.frame,ErrorLocalizer'
replace_errors(
data,
x,
ref = NULL,
...,
cl = NULL,
Ncpus = getOption("Ncpus", 1),
value = c("NA", "suggestion")
)
## S4 method for signature 'data.frame,errorlocation'
replace_errors(
data,
x,
ref = NULL,
...,
cl = NULL,
Ncpus = 1,
value = c("NA", "suggestion")
)
Arguments
data |
data to be checked |
x |
|
ref |
optional reference data set |
... |
these parameters are handed over to |
cl |
optional cluster for parallel execution (see details) |
Ncpus |
number of nodes to use. (see details) |
value |
|
Details
Note that you can also use the result of locate_errors()
with replace_errors
.
When the procedure takes a long time and locate_errors
was called previously
this is the preferred way, because otherwise locate_errors
will be executed again.
The errors that were removed from the data.frame
can be retrieved with the function
errors_removed()
. For more control over error localization see locate_errors()
.
replace_errors
has the same parallelization options as locate_errors()
(see there).
Value
data
with erroneous values removed.
Note
In general it is better to replace the erroneous fields with NA
and apply a proper
imputation method. Suggested values from the error localization method may introduce an undesired bias.
See Also
Other error finding:
errorlocation-class
,
errors_removed()
,
expand_weights()
,
locate_errors()
Examples
rules <- validator( profit + cost == turnover
, cost - 0.6*turnover >= 0
, cost>= 0
, turnover >= 0
)
data <- data.frame(profit=755, cost=125, turnover=200)
data_no_error <- replace_errors(data,rules)
# faulty data was replaced with NA
data_no_error
errors_removed(data_no_error)
# a bit more control, you can supply the result of locate_errors
# to replace_errors, which is a good thing, otherwise replace_errors will call
# locate_errors internally.
error_locations <- locate_errors(data, rules)
replace_errors(data, error_locations)