piv_sel {pivmet} | R Documentation |
Pivotal Selection via Co-Association Matrix
Description
Finding pivotal units from a data partition and a co-association matrix C according to three different methods.
Usage
piv_sel(C, clusters)
Arguments
C |
A |
clusters |
A vector of integers from |
Details
Given a set of N
observations (y_{1},y_{2},...,y_{N})
(y_i
may be a d
-dimensional vector, d \ge 1
),
consider clustering methods to obtain H
distinct partitions
into k
groups.
The matrix C
is the co-association matrix,
where c_{i,p}=n_{i,p}/H
, with n_{i,p}
the number of times
the pair (y_{i},y_{p})
is assigned to the same
cluster among the H
partitions.
Let j
be the group containing units \mathcal J_j
,
the user may choose {i^*}\in\mathcal J_j
that
maximizes one of the quantities:
\sum_{p\in\mathcal J_j} c_{{i^*}p}
or
\sum_{p\in\mathcal J_j} c_{{i^*}p} - \sum_{j\not\in\mathcal J_j} c_{{i^*}p}.
These methods give the unit that maximizes the global
within similarity ("maxsumint"
) and the unit that
maximizes the difference between global within and
between similarities ("maxsumdiff"
), respectively.
Alternatively, we may choose i^{*} \in\mathcal J_j
, which minimizes:
\sum_{p\not\in\mathcal J_j} c_{i^{*}p},
obtaining the most distant unit among the members
that minimize the global dissimilarity between one group
and all the others ("minsumnoint"
).
See the vignette for further details.
Value
pivots |
A matrix with |
Author(s)
Leonardo Egidi legidi@units.it
References
Egidi, L., Pappadà , R., Pauli, F. and Torelli, N. (2018). Relabelling in Bayesian Mixture Models by Pivotal Units. Statistics and Computing, 28(4), 957-969.
Examples
# Iris data
data(iris)
# select the columns of variables
x<- iris[,1:4]
N <- nrow(x)
H <- 1000
a <- matrix(NA, H, N)
# Perform H k-means partitions
for (h in 1:H){
a[h,] <- kmeans(x, centers = 3)$cluster
}
# Build the co-association matrix
C <- matrix(NA, N,N)
for (i in 1:(N-1)){
for (j in (i+1):N){
C[i,j] <- sum(a[,i]==a[,j])/H
C[j,i] <- C[i,j]
}}
km <- kmeans(x, centers =3)
# Apply three pivotal criteria to the co-association matrix
ris <- piv_sel(C, clusters = km$cluster)
graphics::plot(iris[,1], iris[,2], xlab ="Sepal.Length", ylab= "Sepal.Width",
col = km$cluster)
# Add the pivots chosen by the maxsumdiff criterion
points( x[ris$pivots[,3], 1:2], col = 1:3,
cex =2, pch = 8 )