| InitClust {otrimle} | R Documentation | 
Robust Initialization for Model-based Clustering Methods
Description
Computes the initial cluster assignment based on a combination of nearest neighbor based noise detection, and agglomerative hierarchical clustering based on maximum likelihood criteria for Gaussian mixture models.
Usage
 InitClust(data , G , k = 3 , knnd.trim = 0.5 , modelName='VVV')
 Arguments
| data | A numeric vector, matrix, or data frame of observations. Rows correspond
to observations and columns correspond to variables. Categorical
variables and  | 
| G | An integer specifying the number of clusters. | 
| k | An integer specifying the number of considered nearest neighbors per point used for the denoising step (see Details). | 
| knnd.trim | A number in [0,1) which defines the proportion of points
initialized as noise. Tipically  | 
| modelName | A character string indicating the covariance model to be used. Possible models are:  | 
Details
The initialization is based on Coretto and Hennig (2017). First, wwo
steps are performed:
Step 1 (denoising step): for each data point compute its
kth-nearest neighbors
distance (k-NND). All points with k-NND  larger
than the (1-knnd.trim)-quantile  of the k-NND
are initialized as noise. Intepretaion of
k is that:  (k-1), but not k, points close
together may still be interpreted  as noise or outliers
Step 2 (clustering step): perform the model-based hierarchical
clustering (MBHC) proposed in Fraley (1998). This step is performed using
hc. The input argument modelName is passed
to hc. See Details of
hc for more details.
If the previous Step 2 fails to provide G clusters each
containing at least 2 distinct data points, it is replaced with
classical hirararchical clustering implemented in
hclust. Finally, if
hclust fails to provide a valid partition, up
to ten random partitions are tried.
Value
An integer vector specifying the initial cluster
assignment  with 0 denoting noise/outliers.
References
Fraley, C. (1998). Algorithms for model-based Gaussian hierarchical clustering. SIAM Journal on Scientific Computing 20:270-281.
P. Coretto and C. Hennig (2017). Consistency, breakdown robustness, and algorithms for robust improper maximum likelihood clustering. Journal of Machine Learning Research, Vol. 18(142), pp. 1-39. https://jmlr.org/papers/v18/16-382.html
Author(s)
Pietro Coretto pcoretto@unisa.it https://pietro-coretto.github.io
See Also
Examples
 ## Load  Swiss banknotes data
 data(banknote)
 x <- banknote[,-1]
 ## Initial clusters with default arguments
 init <- InitClust(data = x, G = 2)
 print(init)
 ## Perform otrimle
 a <- otrimle(data = x, G = 2, initial = init,
              logicd = c(-Inf, -50, -10), ncores = 1)
 plot(a, what="clustering", data=x)