Outlier {DescTools} | R Documentation |
Return outliers following Tukey's boxplot and Hampel's median/mad definition.
Outlier(x, method = c("boxplot", "hampel"), value = TRUE,na.rm = FALSE)
x |
a (non-empty) numeric vector of data values. |
method |
the method to be used. So far Tukey's boxplot and Hampel's rule are implemented. |
value |
logical. If |
na.rm |
logical. Should missing values be removed? Defaults to |
Outlier detection is a tricky problem and should be handled with care. We implement Tukey's boxplot rule as a rough idea of spotting extreme values.
Hampel considers values outside of median +/- 3 * (median absolute deviation) to be outliers.
the values of x lying outside the whiskers in a boxplot
or the indices of them
Andri Signorell <andri@signorell.net>
Hampel F. R. (1974) The influence curve and its role in robust estimation, Journal of the American Statistical Association, 69, 382-393
Outlier(d.pizza$temperature, na.rm=TRUE)
# it's the same as the result from boxplot
sort(d.pizza$temperature[Outlier(d.pizza$temperature, value=FALSE, na.rm=TRUE)])
b <- boxplot(d.pizza$temperature, plot=FALSE)
sort(b$out)
# nice to find the corresponding rows
d.pizza[Outlier(d.pizza$temperature, value=FALSE, na.rm=TRUE), ]
# compare to Hampel's rule
Outlier(d.pizza$temperature, method="hampel", na.rm=TRUE)
# outliers for the each driver
tapply(d.pizza$temperature, d.pizza$driver, Outlier, na.rm=TRUE)
# the same as:
boxplot(temperature ~ driver, d.pizza)$out