HDoutliers {HDoutliers} | R Documentation |
Leland Wilkinson's hdoutliers Algorithm for Outlier Detection
Description
Detects outliers based on a probability model.
Usage
HDoutliers(data, maxrows=10000, radius=NULL, alpha=0.05, transform=TRUE)
Arguments
data |
A vector, matrix, or data frame consisting of numeric and/or categorical variables. |
maxrows |
If the number of observations is greater than |
radius |
Threshold for determining membership in the exemplars's lists
(used only when the number of observations is greater than |
alpha |
Threshold for determining the cutoff for outliers.
Observations are considered outliers
outliers if they fall in the |
transform |
A logical variable indicating whether or not the data needs to be
transformed to conform to Wilkinson's specifications before outlier
detection. The default is to transform the data using function
|
Details
Wilkinson replaces categorical variables with the leading component from
correspondence analysis, and maps the data to the unit square. This is
done as a preprocessing step if transform = TRUE
(the default).
If the number of observations exceeds maxrows
,
the data is first partitioned into lists associated with exemplars
and their members within radius
of each exemplar,
to reduce the number of nearest-neighbor computations required for
outlier detection.
An exponential distribution is then fitted to the upper tail of the
nearest-neighbor distances between exemplars.
Observations are considered
outliers if they fall in the (1- alpha)
tail of the fitted CDF.
Value
The indexes of the observations determined to be outliers.
References
Wilkinson, L. (2016). Visualizing Outliers.
See Also
getHDmembers
,
getHDoutliers
,
dataTrans
Examples
data(dots)
out.W <- HDoutliers(dots$W)
## Not run:
plotHDoutliers(dots$W,out.W)
## End(Not run)
data(ex2D)
out.ex2D <- HDoutliers(ex2D)
## Not run:
plotHDoutliers(ex2D,out.ex2D)
## End(Not run)
## Not run:
n <- 100000 # number of observations
set.seed(3)
x <- matrix(rnorm(2*n),n,2)
nout <- 10 # number of outliers
x[sample(1:n,size=nout),] <- 10*runif(2*nout,min=-1,max=1)
out.x <- HDoutliers(x)
## End(Not run)