identify_outliers {rstatix}R Documentation

Identify Univariate Outliers Using Boxplot Methods

Description

Detect outliers using boxplot methods. Boxplots are a popular and an easy method for identifying outliers. There are two categories of outlier: (1) outliers and (2) extreme points.

Values above Q3 + 1.5xIQR or below Q1 - 1.5xIQR are considered as outliers. Values above Q3 + 3xIQR or below Q1 - 3xIQR are considered as extreme points (or extreme outliers).

Q1 and Q3 are the first and third quartile, respectively. IQR is the interquartile range (IQR = Q3 - Q1).

Generally speaking, data points that are labelled outliers in boxplots are not considered as troublesome as those considered extreme points and might even be ignored. Note that, any NA and NaN are automatically removed before the quantiles are computed.

Usage

identify_outliers(data, ..., variable = NULL)

is_outlier(x, coef = 1.5)

is_extreme(x)

Arguments

data

a data frame

...

One unquoted expressions (or variable name). Used to select a variable of interest. Alternative to the argument variable.

variable

variable name for detecting outliers

x

a numeric vector

coef

coefficient specifying how far the outlier should be from the edge of their box. Possible values are 1.5 (for outlier) and 3 (for extreme points only). Default is 1.5

Value

Functions

Examples

# Generate a demo data
set.seed(123)
demo.data <- data.frame(
  sample = 1:20,
  score = c(rnorm(19, mean = 5, sd = 2), 50),
  gender = rep(c("Male", "Female"), each = 10)
)

# Identify outliers according to the variable score
demo.data %>%
  identify_outliers(score)

# Identify outliers by groups
demo.data %>%
  group_by(gender) %>%
  identify_outliers("score")

[Package rstatix version 0.7.2 Index]