checkDataIntegrity {ufs} | R Documentation |
Conveniently checking data integrity
Description
This function is designed to make it easy to perform some data integrity checks, specifically checking for values that are impossible or unrealistic. These values can then be replaced by another value, or the offending cases can be deleted from the dataframe.
Usage
checkDataIntegrity(
x,
dat,
newValue = NA,
removeCases = FALSE,
validValueSuffix = "_validValue",
newValueSuffix = "_newValue",
totalVarName = "numberOfInvalidValues",
append = TRUE,
replace = TRUE,
silent = FALSE,
rmarkdownOutput = FALSE,
callingSelf = FALSE
)
Arguments
x |
This can be either a vector or a list. If it is a vector, it should
have two elements, the first one being a regular expression matching one or
more variables in the dataframe specified in |
dat |
The dataframe containing the variables of which we should check the integrity. |
newValue |
The new value to be assigned to cases not satisfying the specified conditions. |
removeCases |
Whether to delete cases that do not satisfy the criterion
from the dataframe (if |
validValueSuffix |
Suffix to append to variable names when creating variable names for new variables that contain TRUE and FALSE to specify for each original variable whether its value satisfied the specified criterion. |
newValueSuffix |
If |
totalVarName |
This is the name of a variable that contains, for each case, the total number of invalid values among all variables checked. |
append |
Whether to append the columns to the dataframe, or only return the new columns. |
replace |
Whether to replace the offending values with the value
specified in |
silent |
Whether to display the log, or only set it as attribute of the returned dataframe. |
rmarkdownOutput |
Whether to format the log so that it's ready to be included in RMarkdown reports. |
callingSelf |
For internal use; whether the function calls itself. |
Value
The dataframe with the corrections, and the log stored in attribute
checkDataIntegrity_log
.
Author(s)
Gjalt-Jorn Peters
Maintainer: Gjalt-Jorn Peters gjalt-jorn@userfriendlyscience.com
Examples
### Default behavior: return dataframe with
### offending values replaced by NA
checkDataIntegrity(c('mpg', '<30'),
mtcars);
### Check two conditions, and instead of returning the
### dataframe with the results appended, only return the
### columns indicating which cases 'pass', what the new
### values would be, and how many invalid values were
### found for each case (to easily remove cases that
### provided many invalid values)
checkDataIntegrity(list(c('mpg', '<30'),
c('gear', '<5')),
mtcars,
append=FALSE);