InitClust {otrimle} | R Documentation |
Robust Initialization for Model-based Clustering Methods
Description
Computes the initial cluster assignment based on a combination of nearest neighbor based noise detection, and agglomerative hierarchical clustering based on maximum likelihood criteria for Gaussian mixture models.
Usage
InitClust(data , G , k = 3 , knnd.trim = 0.5 , modelName='VVV')
Arguments
data |
A numeric vector, matrix, or data frame of observations. Rows correspond
to observations and columns correspond to variables. Categorical
variables and |
G |
An integer specifying the number of clusters. |
k |
An integer specifying the number of considered nearest neighbors per point used for the denoising step (see Details). |
knnd.trim |
A number in [0,1) which defines the proportion of points
initialized as noise. Tipically |
modelName |
A character string indicating the covariance model to be used. Possible models are: |
Details
The initialization is based on Coretto and Hennig (2017). First, wwo
steps are performed:
Step 1 (denoising step): for each data point compute its
k
th-
nearest neighbors
distance (k-
NND). All points with k-
NND larger
than the (1-knnd.trim
)-
quantile of the k-
NND
are initialized as noise. Intepretaion of
k
is that: (k-1)
, but not k
, points close
together may still be interpreted as noise or outliers
Step 2 (clustering step): perform the model-based hierarchical
clustering (MBHC) proposed in Fraley (1998). This step is performed using
hc
. The input argument modelName
is passed
to hc
. See Details of
hc
for more details.
If the previous Step 2 fails to provide G
clusters each
containing at least 2 distinct data points, it is replaced with
classical hirararchical clustering implemented in
hclust
. Finally, if
hclust
fails to provide a valid partition, up
to ten random partitions are tried.
Value
An integer vector specifying the initial cluster
assignment with 0
denoting noise/outliers.
References
Fraley, C. (1998). Algorithms for model-based Gaussian hierarchical clustering. SIAM Journal on Scientific Computing 20:270-281.
P. Coretto and C. Hennig (2017). Consistency, breakdown robustness, and algorithms for robust improper maximum likelihood clustering. Journal of Machine Learning Research, Vol. 18(142), pp. 1-39. https://jmlr.org/papers/v18/16-382.html
Author(s)
Pietro Coretto pcoretto@unisa.it https://pietro-coretto.github.io
See Also
Examples
## Load Swiss banknotes data
data(banknote)
x <- banknote[,-1]
## Initial clusters with default arguments
init <- InitClust(data = x, G = 2)
print(init)
## Perform otrimle
a <- otrimle(data = x, G = 2, initial = init,
logicd = c(-Inf, -50, -10), ncores = 1)
plot(a, what="clustering", data=x)