R: Nearest neighbor based clutter/noise detection

NNclean {prabclus}

R Documentation

Nearest neighbor based clutter/noise detection

Description

Detects if data points are noise or part of a cluster, based on a Poisson process model.

Usage

NNclean(data, k, distances = NULL, edge.correct = FALSE, wrap = 0.1,
convergence = 0.001, plot=FALSE, quiet=TRUE)

## S3 method for class 'nnclean'
print(x, ...)

Arguments

`data`	numerical matrix or data frame.
`k`	integer. Number of considered nearest neighbors per point.
`distances`	distance matrix object of class `dist`. If specified, it is used instead of computing distances from the data.
`edge.correct`	logical. If `TRUE` and the data is two-dimensional, neighbors for points at the edges of the parent region of the noise Poisson process are determined after wrapping the region onto a toroid.
`wrap`	numerical. If `edge.correct=TRUE`, points in a strip of size `wrap*range` along the edge for each variable are candidates for being neighbors of points from the opposite.
`convergence`	numerical. Convergence criterion for EM-algorithm.
`plot`	logical. If `TRUE`, a histogram of the distance to kth nearest neighbor and fit is plotted.
`quiet`	logical. If `FALSE`, the likelihood is printed during the iterations.
`x`	object of class `nnclean`.
`...`	necessary for print methods.

Details

The assumption is that the noise is distributed as a homogeneous Poisson process on a certain region and the clusters are distributed as a homogeneous Poisson process with larger intensity on a subregion (disconnected in case of more than one cluster). The distances are then distributed according to a mixture of two transformed Gamma distributions, and this mixture is estimated via the EM-algorithm. The points are assigned to noise or cluster component by use of the estimated a posteriori probabilities.

Value

NNclean returns a list of class nnclean with components

`z`	0-1-vector of length of the number of data points. 1 means cluster, 0 means noise.
`probs`	vector of estimated a priori probabilities for each point to belong to the cluster component.
`k`	see above.
`lambda1`	intensity parameter of cluster component.
`lambda2`	intensity parameter of noise component.
`p`	estimated probability of cluster component.
`kthNND`	distance to kth nearest neighbor.

Note

The software can be freely used for non-commercial purposes, and can be freely distributed for non-commercial purposes only.

Author(s)

R-port by Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en,
original Splus package by S. Byers and A. E. Raftery.

References

Byers, S. and Raftery, A. E. (1998) Nearest-Neighbor Clutter Removal for Estimating Features in Spatial Point Processes, Journal of the American Statistical Association, 93, 577-584.

Examples

library(mclust)
data(chevron)
nnc <-  NNclean(chevron[,2:3],15,plot=TRUE)
plot(chevron[,2:3],col=1+nnc$z)

[Package prabclus version 2.3-3 Index]