| piv_sel {pivmet} | R Documentation |
Pivotal Selection via Co-Association Matrix
Description
Finding pivotal units from a data partition and a co-association matrix C according to three different methods.
Usage
piv_sel(C, clusters)
Arguments
C |
A |
clusters |
A vector of integers from |
Details
Given a set of N observations (y_{1},y_{2},...,y_{N})
(y_i may be a d-dimensional vector, d \ge 1),
consider clustering methods to obtain H distinct partitions
into k groups.
The matrix C is the co-association matrix,
where c_{i,p}=n_{i,p}/H, with n_{i,p} the number of times
the pair (y_{i},y_{p}) is assigned to the same
cluster among the H partitions.
Let j be the group containing units \mathcal J_j,
the user may choose {i^*}\in\mathcal J_j that
maximizes one of the quantities:
\sum_{p\in\mathcal J_j} c_{{i^*}p}
or
\sum_{p\in\mathcal J_j} c_{{i^*}p} - \sum_{j\not\in\mathcal J_j} c_{{i^*}p}.
These methods give the unit that maximizes the global
within similarity ("maxsumint") and the unit that
maximizes the difference between global within and
between similarities ("maxsumdiff"), respectively.
Alternatively, we may choose i^{*} \in\mathcal J_j, which minimizes:
\sum_{p\not\in\mathcal J_j} c_{i^{*}p},
obtaining the most distant unit among the members
that minimize the global dissimilarity between one group
and all the others ("minsumnoint").
See the vignette for further details.
Value
pivots |
A matrix with |
Author(s)
Leonardo Egidi legidi@units.it
References
Egidi, L., Pappadà , R., Pauli, F. and Torelli, N. (2018). Relabelling in Bayesian Mixture Models by Pivotal Units. Statistics and Computing, 28(4), 957-969.
Examples
# Iris data
data(iris)
# select the columns of variables
x<- iris[,1:4]
N <- nrow(x)
H <- 1000
a <- matrix(NA, H, N)
# Perform H k-means partitions
for (h in 1:H){
a[h,] <- kmeans(x, centers = 3)$cluster
}
# Build the co-association matrix
C <- matrix(NA, N,N)
for (i in 1:(N-1)){
for (j in (i+1):N){
C[i,j] <- sum(a[,i]==a[,j])/H
C[j,i] <- C[i,j]
}}
km <- kmeans(x, centers =3)
# Apply three pivotal criteria to the co-association matrix
ris <- piv_sel(C, clusters = km$cluster)
graphics::plot(iris[,1], iris[,2], xlab ="Sepal.Length", ylab= "Sepal.Width",
col = km$cluster)
# Add the pivots chosen by the maxsumdiff criterion
points( x[ris$pivots[,3], 1:2], col = 1:3,
cex =2, pch = 8 )