getHDoutliers {HDoutliers} | R Documentation |
Outlier Detection Stage of Wilkinson's hdoutliers Algorithm
Description
Detects outliers based on a probability model.
Usage
getHDoutliers(data, memberLists, alpha = 0.05, transform = TRUE)
Arguments
data |
A vector, matrix, or data frame consisting of numeric and/or categorical variables. |
memberLists |
A list following the structure of the output to |
alpha |
Threshold for determining the cutoff for outliers.
Observations are considered outliers
outliers if they fall in the |
transform |
A logical variable indicating whether or not the data needs to be
transformed to conform to Wilkinson's specifications before outlier
detection. The default is to transform the data using function
|
Details
An exponential distribution is fitted to the upper tail of the
nearest-neighbor distances between exemplars (the observations
considered representatives of each component of memberLists
).
Observations are considered
outliers if they fall in the tail of the fitted CDF.
Value
The indexes of the observations determined to be outliers.
References
Wilkinson, L. (2016). Visualizing Outliers. <https://www.cs.uic.edu/~wilkinson/Publications/outliers.pdf>.
Note
A call to getHDoutliers
in which membersLists
result from
a call to getHDmembers
is equivalent to calling HDoutliers
.
See Also
HDoutliers
,
getHDmembers
,
dataTrans
Examples
data(dots)
mem.W <- getHDmembers(dots$W)
out.W <- getHDoutliers(dots$W,mem.W)
## Not run:
plotHDoutliers( dots.W, out.W)
## End(Not run)
data(ex2D)
mem.ex2D <- getHDmembers(ex2D)
out.ex2D <- getHDoutliers( ex2D, mem.ex2D)
## Not run:
plotHDoutliers( ex2D, out.ex2D)
## End(Not run)
## Not run:
n <- 100000 # number of observations
set.seed(3)
x <- matrix(rnorm(2*n),n,2)
nout <- 10 # number of outliers
x[sample(1:n,size=nout),] <- 10*runif(2*nout,min=-1,max=1)
mem.x <- getHDmembers(x)
out.x <- getHDoutliers(x)
## End(Not run)