syntax {validate} | R Documentation |
Syntax to define validation or indicator rules
Description
A concise overview of the validate
syntax.
Basic syntax
The basic rule is that an R-statement that evaluates to a logical
is a
validating statement. This is established by static code inspection when
validator
reads a (set of) user-defined validation rule(s).
Comparisons
All basic comparisons, including >, >=, ==, !=, <=, <
, %in%
are validating statements. When executing a validating statement, the
%in%
operator is replaced with %vin%
.
Logical operations
Unary logical operators '!
', all()
and any
define
validating statements. Binary logical operations including &, &&, |,
||
, are validating when P
and Q
in e.g. P & Q
are
validating. (note that the short-circuits &&
and &
onnly return
the first logical value, in cases where for P && Q
, P
and/or
Q
are vectors. Binary logical implication P\Rightarrow Q
(P
implies Q) is implemented as if ( P ) Q
. The latter is interpreted as
!(P) | Q
.
Type checking
Any function starting with is.
(e.g. is.numeric
) is a
validating expression.
Text search
grepl
is a validating expression.
Functional dependencies
Armstrong's functional dependencies, of the form A + B \to C + D
are
represented using the ~
, e.g. A + B ~ C + D
. For example
postcode ~ city
means, that when two records have the same value for
postcode
, they must have the same value for city
.
Reference the dataset as a whole
Metadata such as numer of rows, columns, column names and so on can be
tested by referencing the whole data set with the '.
'. For example,
the rule nrow(.) == 15
checks whether there are 15 rows in the
dataset at hand.
Uniqueness, completeness
These can be tested in principle with the 'dot' syntax. However, there are
some convenience functions: is_complete
, all_complete
is_unique
, all_unique
.
Local, transient assignment
The operator ':=
' can be used to set up local variables (during, for
example, validation) to save time (the rhs of an assignment is computed only
once) or to make your validation code more maintainable. Assignments work more
or less like common R assignments: they are only valid for statements coming
after the assignment and they may be overwritten. The result of computing the
rhs is not part of a confront
ation with data.
Groups
Often the same constraints/rules are valid for groups of variables.
validate
allows for compact notation. Variable groups can be used
in-statement or by defining them with the :=
operator.
validator( var_group(a,b) > 0 )
is equivalent to
validator(G := var_group(a,b), G > 0)
is equivalent to
validator(a>0,b>0)
.
Using two groups results in the cartesian product of checks. So the statement
validator( f=var_group(c,d), g=var_group(a,b), g > f)
is equivalent to
validator(a > c, b > c, a > d, b > d)
File parsing
Please see the cookbook on how to read rules from and write rules to file:
vignette("cookbook",package="validate")