WH_adaptive.kmeans {HistDAWass} | R Documentation |
K-means of a dataset of histogram-valued data using adaptive Wasserstein distances
Description
The function implements the k-means using adaptive distance for a set of histogram-valued data.
Usage
WH_adaptive.kmeans(
x,
k,
schema = 1,
init,
rep,
simplify = FALSE,
qua = 10,
standardize = FALSE,
weight.sys = "PROD",
theta = 2,
init.weights = "EQUAL",
verbose = FALSE
)
Arguments
x |
A MatH object (a matrix of distributionH). |
k |
An integer, the number of groups. |
schema |
a number from 1 to 4 |
init |
(optional, do not use) initialization for partitioning the data default is 'RPART', other strategies shoul be implemented. |
rep |
An integer, maximum number of repetitions of the algorithm (default |
simplify |
A logic value (default is FALSE), if TRUE histograms are recomputed in order to speed-up the algorithm. |
qua |
An integer, if |
standardize |
A logic value (default is FALSE). If TRUE, histogram-valued data are standardized, variable by variable, using the Wassertein based standard deviation. Use if one wants to have variables with std equal to one. |
weight.sys |
a string. Weights may add to one ('SUM') or their product is equal to 1 ('PROD', default). |
theta |
a number. A parameter if |
init.weights |
a string how to initialize weights: 'EQUAL' (default), all weights are the same, 'RANDOM', weights are initalised at random. |
verbose |
A logic value (default is FALSE). If TRUE, details on computations are shown. |
Value
a list with the results of the k-means of the set of Histogram-valued data x
into k
cluster.
Slots
solution
A list.Returns the best solution among the
rep
etitions, i.e. the one having the minimum sum of squares criterion.solution$IDX
A vector. The clusters at which the objects are assigned.
solution$cardinality
A vector. The cardinality of each final cluster.
solution$centers
A
MatH
object with the description of centers.solution$Crit
A number. The criterion (Sum od square deviation from the centers) value at the end of the run.
quality
A number. The percentage of Sum of square deviation explained by the model. (The higher the better)
References
Irpino A., Rosanna V., De Carvalho F.A.T. (2014). Dynamic clustering of histogram data based on adaptive squared Wasserstein distances. EXPERT SYSTEMS WITH APPLICATIONS, vol. 41, p. 3351-3366, ISSN: 0957-4174, doi: http://dx.doi.org/10.1016/j.eswa.2013.12.001
Examples
results <- WH_adaptive.kmeans(x = BLOOD, k = 2, rep = 10,
simplify = TRUE, qua = 10, standardize = TRUE)