editrules_package {editrules} | R Documentation |
An overview of the function of package editrules
Description
Please note: active development has moved to packages 'validate' and 'errorlocate'. Facilitates reading and manipulating (multivariate) data restrictions (edit rules) on numerical and categorical data. Rules can be defined with common R syntax and parsed to an internal (matrix-like format). Rules can be manipulated with variable elimination and value substitution methods, allowing for feasibility checks and more. Data can be tested against the rules and erroneous fields can be found based on Fellegi and Holt's generalized principle. Rules dependencies can be visualized with using the 'igraph' package.
NOTE
This package is no longer under active development. The package is superseded by R packages validate for data validation and errorlocate for error localization. We urge new users to use those packages instead.
The editrules
package aims to provide an environment to conveniently
define, read and check recordwise data constraints including
Linear (in)equality constraints for numerical data,
Constraints on value combinations of categorical data
Conditional constraints on numerical and/or mixed data
In literature these constraints, or restrictions are refered to as “edits”.
editrules
can perform common rule
set manipulations like variable elimination and value substitution, and
offers error localization functionality based on the
(generalized) paradigm of Fellegi and Holt. Under this paradigm, one determines
the smallest (weighted) number of variables to adapt such that no (additional or derived)
rules are violated. The paradigm is based on the assumption that errors
are distributed randomly over the variables and there is no detectable cause of
error. It also decouples the detection of corrupt variables from their
correction. For some types of error, such as sign flips, typing errors or
rounding errors, this assumption does not hold. These errors can be detected
and are closely related to their resolution. The reader is referred to the
deducorrect package for treating such errors.
I. Define edits
editrules
provides several methods for creating edits from a character
, expression
, data.frame
or a text file.
editfile | Read conditional numerical, numerical and categorical constraints from textfile |
editset | Create conditional numerical, numerical and categorical constraints |
editmatrix | Create a linear constraint matrix for numerical data |
editarray | Create value combination constraints for categorical data |
II. Check and find errors in data
editrules
provides several method for checking data.frame
s with edits
violatedEdits | Find out which record violates which edit. |
localizeErrors | Localize erroneous fields using Fellegi and Holt's principle. |
errorLocalizer | Low-level error localization function using B&B algorithm |
Note that you can call plot
, summary
and print
on results of these functions.
IV. Manipulate and check edits
editrules
provides several methods for manipulating edits
substValue | Substitute a value in a set of rules |
eliminate | Derive implied rules by variable elimination |
reduce | Remove unconstraint variables |
isFeasible | Check for contradictions |
duplicated | Find duplicated rules |
blocks | Decompose rules into independent blocks |
disjunct | Decouple conditional edits into disjunct edit sets |
separate | Decompose rules in blocks and decouple conditinal edits |
generateEdits | Generate all nonredundant implicit edits (editarray only) |
V. Plot and coerce edits
editrules
provides several methods for plotting and coercion.
editrules.plotting | Plot edit-variable connectivity graph |
as.igraph | Coerce to edit-variable connectivity igraph object |
as.character | Coerce edits to character representation |
as.data.frame | Store character representation in data.frame |
See Also
Useful links:
Report bugs at https://github.com/data-cleaning/editrules/issues