remove_percentile_outlier {dataPreparation} | R Documentation |
Percentile outlier filtering
Description
Remove outliers based on percentiles.
Only values within n
th and 100 - n
th percentiles are kept.
Usage
remove_percentile_outlier(
data_set,
cols = "auto",
percentile = 1,
verbose = TRUE
)
Arguments
data_set |
Matrix, data.frame or data.table |
cols |
List of numeric column(s) name(s) of data_set to transform. To transform all numeric columns, set it to "auto". (character, default to "auto") |
percentile |
percentiles to filter (numeric, default to 1) |
verbose |
Should the algorithm talk? (logical, default to TRUE) |
Details
Filtering is made column by column, meaning that extreme values from first element
of cols
are removed, then extreme values from second element of cols
are removed,
...
So if filtering is performed on too many column, there ia high risk that a lot of rows will be dropped.
Value
Same dataset with less rows, edited by reference.
If you don't want to edit by reference please provide set data_set = copy(data_set)
.
Examples
# Given
library(data.table)
data_set <- data.table(num_col = seq_len(100))
# When
data_set <- remove_percentile_outlier(data_set, cols = "auto", percentile = 1, verbose = TRUE)
# Then extreme value is no longer in set
1 %in% data_set[["num_col"]] # Is false
2 %in% data_set[["num_col"]] # Is true