otrimleg {otrimle}R Documentation

OTRIMLE for a range of numbers of clusters with density-based cluster quality statistic

Description

Computes Optimally Tuned Robust Improper Maximum Likelihood Clustering (OTRIMLE), see otrimle, together with the density-based cluster quality statistics Q (Hennig and Coretto 2021) for a range of values of the number of clusters.

Usage

otrimleg(dataset, G=1:6, multicore=TRUE, ncores=detectCores(logical=FALSE)-1,
   erc=20, beta0=0, fixlogicd=NULL, monitor=1, dmaxq=qnorm(0.9995))

Arguments

dataset

something that can be coerced into an observations times variables matrix. The dataset.

G

vector of integers (normally starting from 1). Numbers of clusters to be considered.

multicore

logical. If TRUE, parallel computing is used through the function mclapply from package parallel; read warnings there if you intend to use this; it won't work on Windows.

ncores

integer. Number of cores for parallelisation.

erc

A number larger or equal than one specifying the maximum allowed ratio between within-cluster covariance matrix eigenvalues. See otrimle.

beta0

A non-negative constant, penalty term for noise, to be passed as beta to otrimle, see documentation there.

fixlogicd

numeric of NULL. Value for the logarithm of the improper constant density logicd, see rimle, which is run instead of otrimle if this is not NULL. NULL means that otrimle determines it from the data.

monitor

0 or 1. If 1, progress messages are printed on screen.

dmaxq

numeric. Passed as maxq to kerndensmeasure. The interval considered for the one-dimensional density estimator is (-maxq,maxq).

Details

For estimating the number of clusters this is meant to be called by otrimlesimg. The output of otrimleg is not meant to be used directly for estimating the number of clusters, see Hennig and Coretto (2021).

Value

otrimleg returns a list containing the components solution, iloglik, ibic, criterion, logicd, noiseprob, denscrit, ddpm. All of these are lists or vectors of which the component number is the number of clusters.

solution

list of output objects of otrimle or rimle.

iloglik

vector of improper likelihood values from otrimle.

ibic

vector of improper BIC-values (small is good) computed from iloglik and the numbers of parameters. Note that the behaviour of the improper likelihood is not compatible with the standard use of the BIC, so this is experimental and should not be trusted for choosing the number of clusters.

criterion

vector of values of OTRIMLE criterion, see otrimle.

noiseprob

vector of estimated noise proportions, exproportion[1] from otrimle.

denscrit

vector of density-based cluster quality statistics Q (Hennig and Coretto 2021) as provided by the measure-component of kerndensmeasure.

ddpm

list of the vector of cluster-wise density-based cluster quality measures as provided by the ddpm-component of kerndensmeasure.

Author(s)

Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en/

References

Coretto, P. and C. Hennig (2016). Robust improper maximum likelihood: tuning, computation, and a comparison with other methods for robust Gaussian clustering. Journal of the American Statistical Association, Vol. 111(516), pp. 1648-1659. doi: 10.1080/01621459.2015.1100996

P. Coretto and C. Hennig (2017). Consistency, breakdown robustness, and algorithms for robust improper maximum likelihood clustering. Journal of Machine Learning Research, Vol. 18(142), pp. 1-39. https://jmlr.org/papers/v18/16-382.html

Hennig, C. and P.Coretto (2021). An adequacy approach for deciding the number of clusters for OTRIMLE robust Gaussian mixture based clustering. To appear in Australian and New Zealand Journal of Statistics, https://arxiv.org/abs/2009.00921.

See Also

otrimle, rimle, otrimlesimg, kerndensmeasure

Examples

   data(banknote)
   selectdata <- c(1:30,101:110,117:136,160:161)
   x <- banknote[selectdata,5:7]
   obanknote <- otrimleg(x,G=1:2,multicore=FALSE)

[Package otrimle version 2.0 Index]