| identifyMissing {dataMaid} | R Documentation |
A checkFunction for identifying miscoded missing values.
Description
A checkFunction to be called from check that identifies values that
appear to be miscoded missing values.
Usage
identifyMissing(v, nMax = 10, ...)
Arguments
v |
A variable to check. |
nMax |
The maximum number of problematic values to report.
Default is |
... |
Not in use. |
Details
identifyMissing tries to identify common choices of missing values outside of the
R standard (NA). These include special words (NaN and Inf (no matter the cases)),
one or more -9/9's (e.g. 999, "99", -9, "-99"), one ore more -8/8's (e.g. -8, 888, -8888),
Stata style missing values (commencing with ".") and other character strings
("", " ", "-", "NA" miscoded as character). If the variable is numeric/integer or a
character/factor variable consisting only of numbers and with more than 11 different values,
the numeric miscoded missing values (999, 888, -99, -8 etc.) are
only recognized as miscoded missing if they are maximum or minimum, respectively, and the distance
between the second largest/smallest value and this maximum/minimum value is greater than one.
Value
A checkResult with three entires:
$problem (a logical indicating whether midcoded missing values where found),
$message (a message describing which values in v were suspected to be
miscoded missing values), and $problemValues (the problematic values
in their original format). Note that Only unique problematic values
are listed and that they are presented in alphabetical order.
See Also
check, allCheckFunctions,
checkFunction, checkResult
Examples
##data(testData)
##testData$miscodedMissingVar
##identifyMissing(testData$miscodedMissingVar)
#Identify miscoded numeric missing values
v1 <- c(1:15, 99)
v2 <- c(v1, 98)
v3 <- c(-999, v2, 9999)
identifyMissing(v1)
identifyMissing(v2)
identifyMissing(v3)
identifyMissing(factor(v3))