RDOS {DDoutlier}R Documentation

Relative Density-based Outlier Factor (RDOS) algorithm with gaussian kernel

Description

Function to calculate the Relative Density-based Outlier Factor (RDOS) as an outlier score for observations. Suggested by Tang, B. & Haibo, He. (2017)

Usage

RDOS(dataset, k = 5, h = 1)

Arguments

dataset

The dataset for which observations have an RDOS score returned

k

The number of k-nearest neighbors used to identify reverse- and shared nearest neighbors

h

Bandwidth parameter for gaussian kernel. A small h put more weight on outlying observations

Details

RDOS computes a kernel density estimation by combining the nearest, reverse nearest and shared neighbors into one neighborhood. The density estimation is compared to the density estimation of the neighborhoods observations. A gaussian kernel is used for density estimation, given a bandwidth chosen by user. A kd-tree is used for kNN computation, using the kNN() function from the 'dbscan' package.

It is a computational heavy task to identify reverse and shared neighbors from the kd-tree. Thus, the RDOS has high complexity and is not recommended to apply to datasets with n>5000. The RDOS function is useful for outlier detection in clustering and other multidimensional domains

Value

A vector of RDOS scores for observations. The greater the RDOS score, the greater outlierness

Author(s)

Jacob H. Madsen

References

Tang, B. & Haibo, He. (2017). A local density-based approach for outlier detection. Neurocomputing. pp. 171-180. DOI: 10.1016/j.neucom.2017.02.039

Examples

# Create dataset
X <- iris[,1:4]

# Find outliers by setting an optional k
outlier_score <- RDOS(dataset=X, k=10, h=2)

# Sort and find index for most outlying observations
names(outlier_score) <- 1:nrow(X)
sort(outlier_score, decreasing = TRUE)

# Inspect the distribution of outlier scores
hist(outlier_score)

[Package DDoutlier version 0.1.0 Index]