ctlcurves {tclust} | R Documentation |
Classification Trimmed Likelihood Curves
Description
The function applies tclust
several times on a given dataset while parameters
alpha
and k
are altered. The resulting object gives an idea of the optimal
trimming level and number of clusters considering a particular dataset.
Usage
ctlcurves(
x,
k = 1:4,
alpha = seq(0, 0.2, len = 6),
restr.fact = 50,
parallel = FALSE,
trace = 1,
...
)
Arguments
x |
A matrix or data frame of dimension n x p, containing the observations (row-wise). |
k |
A vector of cluster numbers to be checked. By default cluster numbers from 1 to 5 are examined. |
alpha |
A vector containing the alpha levels to be checked. By default |
restr.fact |
The restriction factor passed to |
parallel |
A logical value, to be passed further to |
trace |
Defines the tracing level, which is set to |
... |
Further arguments (as e.g. |
Details
These curves show the values of the trimmed classification (log-)likelihoods
when altering the trimming proportion alpha
and the number of clusters k
.
The careful examination of these curves provides valuable information for choosing
these parameters in a clustering problem. For instance, an appropriate k
to be chosen is one that we do not observe a clear increase in the trimmed classification
likelihood curve for k with respect to the k+1 curve for almost all the range
of alpha values. Moreover, an appropriate choice of parameter alpha may be derived
by determining where an initial fast increase of the trimmed classification
likelihood curve stops for the final chosen k. A more detailed explanation can
be found in García-Escudero et al. (2011).
Value
The function returns an S3 object of type ctlcurves
containing the following components:
-
par
A list containing all the parameters passed to this function -
obj
An array containing the objective functions values of each computed cluster-solution -
min.weights
An array containing the minimum cluster weight of each computed cluster-solution
References
García-Escudero, L.A.; Gordaliza, A.; Matrán, C. and Mayo-Iscar, A. (2011), "Exploring the number of groups in robust model-based clustering." Statistics and Computing, 21 pp. 585-599, <doi:10.1007/s11222-010-9194-z>
Examples
## Not run:
#--- EXAMPLE 1 ------------------------------------------
sig <- diag (2)
cen <- rep (1, 2)
x <- rbind(MASS::mvrnorm(108, cen * 0, sig),
MASS::mvrnorm(162, cen * 5, sig * 6 - 2),
MASS::mvrnorm(30, cen * 2.5, sig * 50))
ctl <- ctlcurves(x, k = 1:4)
ctl
## ctl-curves
plot(ctl) ## --> selecting k = 2, alpha = 0.08
## the selected model
plot(tclust(x, k = 2, alpha = 0.08, restr.fact = 7))
#--- EXAMPLE 2 ------------------------------------------
data(geyser2)
ctl <- ctlcurves(geyser2, k = 1:5)
ctl
## ctl-curves
plot(ctl) ## --> selecting k = 3, alpha = 0.08
## the selected model
plot(tclust(geyser2, k = 3, alpha = 0.08, restr.fact = 5))
#--- EXAMPLE 3 ------------------------------------------
data(swissbank)
ctl <- ctlcurves(swissbank, k = 1:5, alpha = seq (0, 0.3, by = 0.025))
ctl
## ctl-curves
plot(ctl) ## --> selecting k = 2, alpha = 0.1
## the selected model
plot(tclust(swissbank, k = 2, alpha = 0.1, restr.fact = 50))
## End(Not run)