anomalies {xray} | R Documentation |
Analyze a dataset and search for anomalies
Description
If any anomalous columns are found, they are reported as a warning and returned in a data.frame. To interpret the output, we are getting these anomalies:
NA values: NA
0 values: Zero
Blank strings: Blank
Infinite numbers: Inf
Usage
anomalies(data_analyze, anomaly_threshold = 0.8, distinct_threshold = 2)
Arguments
data_analyze |
a data frame or tibble to analyze |
anomaly_threshold |
the minimum percentage of anomalous rows for the column to be problematic |
distinct_threshold |
the minimum amount of distinct values the column has to have to not be problematic, usually you want to keep this at it's default value. |
Details
All of these value are reported in columns prefixed by q (quantity), indicating the rows with the anomaly, and p (percentage), indicating percent of total rows with the anomaly.
And, also any columns with only one distinct value, which means the column doesn't bring information to the table (If all rows are equal, why bother having that column?). We report the number of distinct values in qDistinct.
Examples
library(xray)
anomalies(mtcars, anomaly_threshold=0.5)