spkmeans {T4cluster}R Documentation

Spherical K-Means Clustering

Description

Spherical k-means algorithm performs clustering for the data residing on the unit hypersphere with the cosine similarity. If the data is not normalized, it performs the normalization and proceeds thereafter.

Usage

spkmeans(data, k = 2, ...)

Arguments

data

an (n\times p) matrix of row-stacked observations. If not row-stochastic, each row is normalized to be unit norm.

k

the number of clusters (default: 2).

...

extra parameters including

init

initialization method; either "kmeans" or "gmm" (default: "kmeans").

maxiter

the maximum number of iterations (default: 10).

abstol

stopping criterion to stop the algorithm (default: 10^{-8}).

verbose

a logical; TRUE to show iteration history or FALSE to quiet.

Value

a named list of S3 class T4cluster containing

cluster

a length-n vector of class labels (from 1:k).

cost

a value of the cost function.

means

an (k\times p) matrix where each row is a unit-norm class mean.

algorithm

name of the algorithm.

References

I. S. Dhillon and D. S. Modha (2001). "Concept decompositions for large sparse text data using clustering." Machine Learning, 42:143–175.

Examples


# -------------------------------------------------------------
#            clustering with 'household' dataset
# -------------------------------------------------------------
## PREPARE
data(household, package="T4cluster")
X   = household$data
lab = as.integer(household$gender)

## EXECUTE SPKMEANS WITH VARYING K's
vec.rand = rep(0, 9)
for (i in 1:9){
  clust_i = spkmeans(X, k=(i+1))$cluster
  vec.rand[i] = compare.rand(clust_i, lab)
}

## VISUALIZE THE RAND INDEX
opar <- par(no.readonly=TRUE)
plot(2:10, vec.rand, type="b", pch=19, ylim=c(0.5, 1),
     ylab="Rand index",xlab="number of clusters",
     main="clustering quality index over varying k's.")
par(opar)



[Package T4cluster version 0.1.2 Index]