cl_pclust {clue}R Documentation

Prototype-Based Partitions of Clusterings

Description

Compute prototype-based partitions of a cluster ensemble by minimizing wbubjmd(xb,pj)e\sum w_b u_{bj}^m d(x_b, p_j)^e, the sum of the case-weighted and membership-weighted ee-th powers of the dissimilarities between the elements xbx_b of the ensemble and the prototypes pjp_j, for suitable dissimilarities dd and exponents ee.

Usage

cl_pclust(x, k, method = NULL, m = 1, weights = 1,
          control = list())

Arguments

x

an ensemble of partitions or hierarchies, or something coercible to that (see cl_ensemble).

k

an integer giving the number of classes to be used in the partition.

method

the consensus method to be employed, see cl_consensus.

m

a number not less than 1 controlling the softness of the partition (as the “fuzzification parameter” of the fuzzy cc-means algorithm). The default value of 1 corresponds to hard partitions obtained from a generalized kk-means problem; values greater than one give partitions of increasing softness obtained from a generalized fuzzy cc-means problem.

weights

a numeric vector of non-negative case weights. Recycled to the number of elements in the ensemble given by x if necessary.

control

a list of control parameters. See Details.

Details

Partitioning is performed using pclust via a family constructed from method. The dissimilarities dd and exponent ee are implied by the consensus method employed, and inferred via a registration mechanism currently only made available to built-in consensus methods. The default methods compute Least Squares Euclidean consensus clusterings, i.e., use Euclidean dissimilarity dd and e=2e = 2.

For m=1m = 1, the partitioning procedure was introduced by Gaul and Schader (1988) for “Clusterwise Aggregation of Relations” (with the same domains), containing equivalence relations, i.e., hard partitions, as a special case.

Available control parameters are as for pclust.

The fixed point approach employed is a heuristic which cannot be guaranteed to find the global minimum (as this is already true for the computation of consensus clusterings). Standard practice would recommend to use the best solution found in “sufficiently many” replications of the base algorithm.

Value

An object of class "cl_partition" representing the obtained “secondary” partition by an object of class "cl_pclust", which is a list containing at least the following components.

prototypes

a cluster ensemble with the kk prototypes.

membership

an object of class "cl_membership" with the membership values ubju_{bj}.

cluster

the class ids of the nearest hard partition.

silhouette

Silhouette information for the partition, see silhouette.

validity

precomputed validity measures for the partition.

m

the softness control argument.

call

the matched call.

d

the dissimilarity function d=d(x,p)d = d(x, p) employed.

e

the exponent ee employed.

References

J. C. Bezdek (1981). Pattern recognition with fuzzy objective function algorithms. New York: Plenum.

W. Gaul and M. Schader (1988). Clusterwise aggregation of relations. Applied Stochastic Models and Data Analysis, 4:273–282. doi:10.1002/asm.3150040406.

Examples

## Use a precomputed ensemble of 50 k-means partitions of the
## Cassini data.
data("CKME")
CKME <- CKME[1 : 30]		# for saving precious time ...
diss <- cl_dissimilarity(CKME)
hc <- hclust(diss)
plot(hc)
## This suggests using a partition with three classes, which can be
## obtained using cutree(hc, 3).  Could use cl_consensus() to compute
## prototypes as the least squares consensus clusterings of the classes,
## or alternatively:
set.seed(123)
x1 <- cl_pclust(CKME, 3, m = 1)
x2 <- cl_pclust(CKME, 3, m = 2)
## Agreement of solutions.
cl_dissimilarity(x1, x2)
table(cl_class_ids(x1), cl_class_ids(x2))

[Package clue version 0.3-65 Index]