check {dataReporter} | R Documentation |
Perform checks of potential errors in variable/dataset
Description
Run a set of validation checks to check a variable vector or a full dataset for potential errors. Which checks are performed depends on the class of the variable and on user inputs.
Usage
check(v, nMax = 10, checks = setChecks(), ...)
Arguments
v |
the vector or the dataset ( |
nMax |
If a check is supposed to identify problematic values,
this argument controls if all of these should be pasted onto the outputted
message, or if only the first |
checks |
A list of checks to use on each supported variable type. We recommend
using |
... |
Other arguments that are passed on to the checking functions.
These includes general parameters controlling how the check results are
formatted (e.g. |
Details
It should be noted that the default options for each variable type
are returned by calling e.g. defaultCharacterChecks()
,
defaultFactorChecks()
, defaultNumericChecks()
, etc. A complete
overview of all default options can be obtained by calling setChecks()
.
Moreover, all available checkFunction
s (including both locally defined
functions and functions imported from dataReporter
or other packages) can
be viewed by calling allCheckFunctions()
.
Value
If v
is a variable, a list of objects of class
checkResult
, which each summarizes the result of a
checkFunction
call performed on v
.
See checkResult
for more details. If V
is a
data.frame
, a list of lists of the form above
is returned instead with one entry for each variable in v
.
References
Petersen AH, Ekstrøm CT (2019). “dataMaid: Your Assistant for Documenting Supervised Data Quality Screening in R.” _Journal of Statistical Software_, *90*(6), 1-38. doi: 10.18637/jss.v090.i06 ( doi: 10.18637/jss.v090.i06).
See Also
setChecks
,
allCheckFunctions
checkResult
checkFunction
, defaultCharacterChecks
,
defaultFactorChecks
, defaultLabelledChecks
,
defaultHavenlabelledChecks
,
defaultNumericChecks
, defaultIntegerChecks
,
defaultLogicalChecks
, defaultDateChecks
Examples
x <- 1:5
check(x)
#Annoyingly coded missing as 99
y <- c(rnorm(100), rep(99, 10))
check(y)
#Check y for outliers and print 4 decimals for problematic variables
check(y, checks = setChecks(numeric = "identifyOutliers"), maxDecimals = 4)
#Change what checks are performed on a variable, now only identifyMissing is called
# for numeric variables
check(y, checks = setChecks(numeric = "identifyMissing"))
#Check a full data.frame at once
data(cars)
check(cars)
#Check a full data.frame at once, while changing the standard settings for
#several data classes at once. Here, we ommit the check of miscoded missing values for factors
#and we only do this check for numeric variables:
check(cars, checks = setChecks(factor = defaultFactorChecks(remove = "identifyMissing"),
numeric = "identifyMissing"))