NAN {DDoutlier} | R Documentation |
Natural Neighbor (NAN) algorithm to return the self-adaptive neighborhood
Description
Function to identify natural neighbors and the right k-parameter for kNN graphs as suggested by Zhu, Q., Feng, Ji. & Huang, J. (2016)
Usage
NAN(dataset, NaN_Edges = FALSE)
Arguments
dataset |
The dataset for which natural neighbors are identified along with a k-parameter |
NaN_Edges |
Choice for computing natural neighbors. Computational heavy to compute |
Details
NAN computes the natural neighbor eigenvalue and identifies natural neighbors in a dataset. The natural neighbor eigenvalue is powerful as k-parameter for computing a k-nearest neighborhood, being suitable for outlier detection, clustering or predictive modelling. Natural neighbors are defined as two observations being mutual k-nearest neighbors. A kd-tree is used for kNN computation, using the kNN() function from the 'dbscan' package
Value
NaN_Num |
The number of in-degrees for observations given r |
r |
Natural neighbor eigenvalue. Useful as k-parameter |
NaN_Edges |
Matrix of edges for natural neighbors |
n_NaN |
The number of natural neighbors |
Author(s)
Jacob H. Madsen
References
Zhu, Q., Feng, Ji. & Huang, J. (2016). Natural neighbor: A self-adaptive neighborhood method without parameter K. Pattern Recognition Letters. pp. 30-36. DOI: 10.1016/j.patrec.2016.05.007
Examples
# Select dataset
X <- iris[,1:4]
# Identify the right k-parameter
K <- NAN(X, NaN_Edges=FALSE)$r
# Use the k-setting in an abitrary outlier detection algorithm
outlier_score <- LOF(dataset=X, k=K)
# Sort and find index for most outlying observations
names(outlier_score) <- 1:nrow(X)
sort(outlier_score, decreasing = TRUE)
# Inspect the distribution of outlier scores
hist(outlier_score)