DB {DDoutlier} | R Documentation |
Distance-based outlier detection based on user-given neighborhood size
Description
Function to calculate how many observations are within a certain sized neighborhood as an outlier score. Outliers are classified according to a user-given threshold of observations to be within the neighborhood. Suggested by Knorr, M., & Ng, R. T. (1997)
Usage
DB(dataset, d = 1, fraction = 0.05)
Arguments
dataset |
The dataset for which observations are classified as outliers/inliers |
d |
The radius of the neighborhood |
fraction |
The proportion of the number of observations to be within the neighborhood for observations to be classified as inliers. If the proportion of observations within the neighborhood is less than the given fraction, observations are classified as outliers |
Details
DB computes a neighborhood for each observation given a radius (argument 'd') and returns the number of neighbors within the neighborhood. Observations are classified as inliers or outliers, based on a proportion (argument 'fraction') of observations to be within the neighborhood
Value
neighbors |
The number of neighbors within the neighborhood |
classification |
Binary classification of observations as inlier or outlier |
Author(s)
Jacob H. Madsen
References
Knorr, M., & Ng, R. T. (1997). A Unified Approach for Mining Outliers. In Conf. of the Centre for Advanced Studies on Collaborative Research (CASCON). Toronto, Canada. pp. 236-248. DOI: 10.1145/782010.782021
Examples
# Create dataset
X <- iris[,1:4]
# Classify observations
cls_observations <- DB(dataset=X, d=1, fraction=0.05)$classification
# Remove outliers from dataset
X <- X[cls_observations=='Inlier',]