LOCI {DDoutlier} | R Documentation |
Local Correlation Integral (LOCI) algorithm with constant nearest neighbor parameter
Description
Function to calculate Local Correlation Integral (LOCI) as an outlier score for observations. Suggested by Papadimitriou, S., Gibbons, P. B., & Faloutsos, C. (2003). Uses a k number of nearest neighbors instead of a constant radius
Usage
LOCI(dataset, alpha = 0.5, nn = 20, k = 3)
Arguments
dataset |
The dataset for which observations have a LOCI returned |
alpha |
The parameter setting the size of the sampling neighborhood, as a proportion of the counting neighborhood, for observations to identify other observations in their respective neighborhood. An alpha of 1 equals a sampling neighborhood the size of the counting neighborhood (the size of distance to nn). An alpha of 0.5 equals a sampling neighborhood half the size of the counting neighborhood |
nn |
The number of nearest neighbors to compare sampling neighborhood with. Original paper suggest a constant user-given radius that includes at least 20 neighbors in order to introduce statistical errors in MDEF. Default is 20 |
k |
The number of standard deviations the sampling neighborhood of an observation should differ from the sampling neighborhood of neighboring observations, to be an outlier. Default is set to 3 as used in original papers experiments |
Details
LOCI computes a counting neighborhood to the nn nearest observations, where the radius is equal to the outermost observation. Within the counting neighborhood each observation has a sampling neighborhood of which the size is determined by the alpha input parameter. LOCI returns an outlier score based on the standard deviation of the sampling neighborhood, called the multi-granularity deviation factor (MDEF). The LOCI function is useful for outlier detection in clustering and other multidimensional domains
Value
npar_pi |
A vector of the number of observations within the sample neighborhood for observations |
avg_npar |
A vector of average number of observations within the sample neighborhood for neighboring observations |
sd_npar |
A vector of standard deviations for observations sample neighborhood |
MDEF |
A vector of the multi-granularity deviation factor (MDEF) for observations. The greater the MDEF, the greater the outlierness |
norm_MDEF |
A vector of normalized MDEF-values, being sd_npar/avg_npar |
class |
Classification of observations as inliers/outliers following the rule of k |
Author(s)
Jacob H. Madsen
References
Papadimitriou, S., Gibbons, P. B., & Faloutsos, C. (2003). LOCI: Fast Outlier Detection Using the Local Correlation Integral. In International Conference on Data Engineering. pp. 315-326. DOI: 10.1109/ICDE.2003.1260802
Examples
# Create dataset
X <- iris[,1:4]
# Classify observations
cls_observations <- LOCI(dataset=X, alpha=0.5, nn=20, k=1)$class
# Remove outliers from dataset
X <- X[cls_observations=='Inlier',]