R: Influenced Outlierness (INFLO) algorithm

INFLO {DDoutlier}

R Documentation

Influenced Outlierness (INFLO) algorithm

Description

Function to calculate the influenced outlierness as an outlier score for observations. Suggested by Jin, W., Tung, A. K. H., Han, J., & Wang, W. (2006)

Usage

INFLO(dataset, k = 5)

Arguments

`dataset`	The dataset for which observations have an INFLO score returned
`k`	The number of reverse k-nearest neighbors to compare density with. k has to be smaller than the number of observations in dataset

Details

INFLO computes the influenced outlierness score for observations, being the comparison of density in neighborhood of observation subject to outlier scoring and density in the reverse neighborhood. A kd-tree is used for kNN computation, using the kNN() function from the 'dbscan' package. The INFLO function is useful for outlier detection in clustering and other multidimensional domains

Value

A vector of INFLO scores for observations. The greater the INFLO, the greater outlierness

Author(s)

Jacob H. Madsen

References

Jin, W., Tung, A. K. H., Han, J., & Wang, W. (2006). Ranking Outliers Using Symmetric Neighborhood Relationship. In Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD). Singapore. pp 577-593. DOI: 10.1007/11731139_68

Examples

# Create dataset
X <- iris[,1:4]

# Find outliers by setting an optional k
outlier_score <- INFLO(dataset=X, k=10)

# Sort and find index for most outlying observations
names(outlier_score) <- 1:nrow(X)
sort(outlier_score, decreasing = TRUE)

# Inspect the distribution of outlier scores
hist(outlier_score)

[Package DDoutlier version 0.1.0 Index]