remove_sd_outlier {dataPreparation} | R Documentation |

## Standard deviation outlier filtering

### Description

Remove outliers based on standard deviation thresholds.

Only values within `mean - sd * n_sigmas`

and `mean + sd * n_sigmas`

are kept.

### Usage

```
remove_sd_outlier(data_set, cols = "auto", n_sigmas = 3, verbose = TRUE)
```

### Arguments

`data_set` |
Matrix, data.frame or data.table |

`cols` |
List of numeric column(s) name(s) of data_set to transform. To transform all numeric columns, set it to "auto". (character, default to "auto") |

`n_sigmas` |
number of times standard deviation is accepted (integer, default to 3) |

`verbose` |
Should the algorithm talk? (logical, default to TRUE) |

### Details

Filtering is made column by column, meaning that extreme values from first element
of `cols`

are removed, then extreme values from second element of `cols`

are removed,
...

So if filtering is performed on too many column, there ia high risk that a lot of rows will be dropped.

### Value

Same dataset with less rows, edited by **reference**.

If you don't want to edit by reference please provide set `data_set = copy(data_set)`

.

### Examples

```
# Given
library(data.table)
col_vals <- runif(1000)
col_mean <- mean(col_vals)
col_sd <- sd(col_vals)
extreme_val <- col_mean + 6 * col_sd
data_set <- data.table(num_col = c(col_vals, extreme_val))
# When
data_set <- remove_sd_outlier(data_set, cols = "auto", n_sigmas = 3, verbose = TRUE)
# Then extreme value is no longer in set
extreme_val %in% data_set[["num_col"]] # Is false
```

*dataPreparation*version 1.1.1 Index]