ksharp {ksharp}R Documentation

sharpen a clustering

Description

Each data point in a clustering is assigned to a cluster, but some data points may lie in ambiguous zones between two or more clusters, or far from other points. Cluster sharpening assigns these border points into a separate noise group, thereby creating more stark distinctions between groups.

Usage

ksharp(
  x,
  threshold = 0.1,
  data = NULL,
  method = c("silhouette", "neighbor", "medoid"),
  threshold.abs = NULL
)

Arguments

x

clustering object; several types of inputs are acceptable, including objects of class kmeans, pam, and self-made lists with a component "cluster".

threshold

numeric; the fraction of points to place in noise group

data

matrix, raw data corresponding to clustering x; must be present when sharpening for the first time or if data is not present within x.

method

character, determines method used for sharpening

threshold.abs

numeric; absolute-value of threshold for sharpening. When non-NULL, this value overrides value in argument 'threshold'

Details

Noise points are assigned to a group with cluster index 0. This is analogous behavior to output produced by dbscan.

Value

clustering object based on input x, with adjusted cluster assignments and additional list components with sharpness measures. Cluster assignments are placed in $cluster and excised data points are given a cluster index of 0. Original cluster assignments are saved in $cluster.original. Sharpness measures are stored in components $silinfo, $medinfo, and $neiinfo, although these details may change in future versions of the package.

Examples


# prepare iris dataset for analysis
iris.data = iris[, 1:4]
rownames(iris.data) = paste0("iris_", seq_len(nrow(iris.data)))

# cluster the dataset into three groups
iris.clustered = kmeans(iris.data, centers=3)
table(iris.clustered$cluster)

# sharpen the clustering by excluding 10% of the data points
iris.sharp = ksharp(iris.clustered, threshold=0.1, data=iris.data)
table(iris.sharp$cluster)

# visualize cluster assignments
iris.pca = prcomp(iris.data)$x[,1:2]
plot(iris.pca, col=iris$Species, pch=ifelse(iris.sharp$cluster==0, 1, 19))


[Package ksharp version 0.1.0.1 Index]