ExpOutliers {SmartEDA} | R Documentation |
Univariate Outlier Analysis
Description
this function will run univariate outlier analysis based on boxplot or SD method. The function returns the summary of oultlier for selected numeric features and adding new features if there is any outliers
Usage
ExpOutliers(
data,
varlist = NULL,
method = "boxplot",
treatment = NULL,
capping = c(0.05, 0.95),
outflag = FALSE
)
Arguments
data |
dataframe or matrix |
varlist |
list of numeric variable to perform the univariate outlier analysis |
method |
detect outlier method boxplot or NxStDev (where N is 1 or 2 or 3 std deviations, like 1xStDev or 2xStDev or 3xStDev) |
treatment |
treating outlier value by mean or median. default NULL |
capping |
default LL = 0.05 & UL = 0.95cap the outlier value by replacing those observations outside the lower limit with the value of 5th percentile and above the upper limit, with the value of 95th percentile value |
outflag |
add extreme value flag variable into output data |
Details
this function provides both summary of the outlier variable and data
Univariate outlier analysis method
-
boxplot
is If a data value are below (Q1 minus 1.5x IQR) or boxplot lower whisker or above (Q3 plus 1.5x IQR) or boxplot upper whisker then those points are flaged as outlier value -
Standard Deviation
is If a data distribution is approximately normal then about 68 percent of the data values lie within one standard deviation of the mean and about 95 percent are within two standard deviations, and about 99.7 percent lie within three standard deviations. If any data point that is more than 3 times the standard deviation, then those points are flaged as outlier value
Value
Outlier summary includes
-
Num of outliers
is Number of outlier in each variable -
Lower bound
is Q1 minus 1.5x IQR for boxplot; Mean minus 3x StdDev for Standard Deviation method -
Upper bound
is Q3 plus 1.5x IQR for boxplot; Mean plus 3x StdDev for Standard Deviation method -
Lower cap
is Lower percentile capping value -
Upper cap
is Upper percentile capping value
Examples
ExpOutliers(mtcars, varlist = c("mpg","disp","wt", "qsec"), method = 'BoxPlot',
capping = c(0.1, 0.9), outflag = TRUE)
ExpOutliers(mtcars, varlist = c("mpg","disp","wt", "qsec"), method = '2xStDev',
capping = c(0.1, 0.9), outflag = TRUE)
# Mean imputation or 5th percentile or 95th percentile value capping
ExpOutliers(mtcars, varlist = c("mpg","disp","wt", "qsec"), method = 'BoxPlot',
treatment = "mean", capping = c(0.05, 0.95), outflag = TRUE)