within_n_mads {assertr} | R Documentation |
This function takes one argument, the number of median absolute
deviations within which to accept a particular data point. This is
generally more useful than its sister function within_n_sds
because it is more robust to the presence of outliers. It is therefore
better suited to identify potentially erroneous data points.
within_n_mads(n, ...)
n |
The number of median absolute deviations from the median within which to accept a datum |
... |
Additional arguments to be passed to |
As an example, if '2' is passed into this function, this will return
a function that takes a vector and figures out the bounds of two
median absolute deviations (MADs) from the median. That function will then
return a within_bounds
function that can then be applied
to a single datum. If the datum is within two MADs of the median of the
vector given to the function returned by this function, it will return TRUE.
If not, FALSE.
This function isn't meant to be used on its own, although it can. Rather,
this function is meant to be used with the insist
function to
search for potentially erroneous data points in a data set.
A function that takes a vector and returns a
within_bounds
predicate based on the MAD
of that vector.
test.vector <- rnorm(100, mean=100, sd=20)
within.one.mad <- within_n_mads(1)
custom.bounds.checker <- within.one.mad(test.vector)
custom.bounds.checker(105) # returns TRUE
custom.bounds.checker(40) # returns FALSE
# same as
within_n_mads(1)(test.vector)(40) # returns FALSE
within_n_mads(2)(test.vector)(as.numeric(NA)) # returns TRUE
# because, by default, within_bounds() will accept
# NA values. If we want to reject NAs, we have to
# provide extra arguments to this function
within_n_mads(2, allow.na=FALSE)(test.vector)(as.numeric(NA)) # returns FALSE
# or in a pipeline, like this was meant for
library(magrittr)
iris %>%
insist(within_n_mads(5), Sepal.Length)