select_nb_cluster {MagmaClustR} | R Documentation |
Select the optimal number of clusters
Description
In MagmaClust, as for any clustering method, the number K of clusters has to be provided as an hypothesis of the model. This function implements a model selection procedure, by maximising a variational BIC criterion, computed for different values of K. A heuristic for a fast approximation of the procedure is proposed as well, although the corresponding models would not be properly trained.
Usage
select_nb_cluster(
data,
fast_approx = TRUE,
grid_nb_cluster = 1:10,
ini_hp_k = NULL,
ini_hp_i = NULL,
kern_k = "SE",
kern_i = "SE",
plot = TRUE,
...
)
Arguments
data |
A tibble or data frame. Columns required: |
fast_approx |
A boolean, indicating whether a fast approximation should
be used for selecting the number of clusters. If TRUE, each Magma or
MagmaClust model will perform only one E-step of the training, using
the same fixed values for the hyper-parameters ( |
grid_nb_cluster |
A vector of integer, corresponding to grid of values that will be tested for the number of clusters. |
ini_hp_k |
A tibble or data frame of hyper-parameters associated with
|
ini_hp_i |
A tibble or data frame of hyper-parameters associated with
|
kern_k |
A kernel function associated to the mean processes. |
kern_i |
A kernel function associated to the individuals/tasks. |
plot |
A boolean indicating whether the plot of V-BIC values for all numbers of clusters should displayed. |
... |
Any additional argument that could be passed to
|
Value
A list, containing the results of model selection procedure for selecting the optimal number of clusters thanks to a V-BIC criterion maximisation. The elements of the list are:
best_k: An integer, indicating the resulting optimal number of clusters
seq_vbic: A vector, corresponding to the sequence of the V-BIC values associated with the models trained for each provided cluster's number in
grid_nb_cluster
.trained_models: A list, named by associated number of clusters, of Magma or MagmaClust models that have been trained (or approximated if
fast_approx
= T) during the model selection procedure.
Examples
TRUE