CPC {CPC}R Documentation

Cluster-Polarization Coefficient

Description

Implements clustering algorithms and calculates cluster-polarization coefficient. Contains support for hierarchical clustering, k-means clustering, partitioning around medoids, density-based spatial clustering with noise, and manual assignment of cluster membership.

Usage

CPC(
  data,
  type,
  k = NULL,
  epsilon = NULL,
  model = FALSE,
  adjust = FALSE,
  cols = NULL,
  clusters = NULL,
  ...
)

Arguments

data

a numeric vector or n x k matrix or data frame. If type = "manual", data must be a matrix containing a vector identifying cluster membership for each observation, to be passed to clusters argument.

type

a character string giving the type of clustering method to be used. See Details.

k

the desired number of clusters. Required if type = "hclust", type = "kmeans", or type = "pam".

epsilon

radius of epsilon neighborhood. Required if type = "dbscan".

model

a logical indicating whether clustering model output should be returned. Defaults to FALSE.

adjust

a logical indicating whether the adjusted CPC should be calculated. Defaults to FALSE. Note that both CPC and adjusted CPC are automatically calculated and returned if model = TRUE.

cols

columns of data to be used in CPC calculation. Only used if type = "manual".

clusters

column of data indicating cluster membership for each observation. Only used if type = "manual".

...

arguments passed to other functions.

Details

type must take one of five values: "hclust" performs agglomerative hierarchical clustering via hclust(). "kmeans" performs k-means clustering via kmeans(). "pam" performs k-medoids clustering via pam(). "dbscan" performs density-based clustering via dbscan(). "manual" indicates that no clustering is necessary and that the researcher has specified cluster assignments.

For all clustering methods, additional arguments to fine-tune clustering performance, such as the specific algorithm to be used, should be passed to CPC() and will be inherited by the specified clustering function. In particular, if type = "kmeans", using a large number of random starts is recommended. This can be specified with the nstart argument to kmeans(), passed directly to CPC().

If type = "manual", data must contain a vector identifying cluster membership for each observation, and cols and clusters must be defined.

Value

If model = TRUE, CPC() returns a list with components containing output from the specified clustering function, all sums of squares, CPC, and adjusted CPC. If model = FALSE, CPC() returns a numeric vector of length 1 giving the CPC (if adjust = FALSE) or adjusted CPC (if adjust = TRUE).

Examples

data <- matrix(c(rnorm(50, 0, 1), rnorm(50, 5, 1)), ncol = 2, byrow = TRUE)
clusters <- matrix(c(rep(1, 25), rep(2, 25)), ncol = 1)
data <- cbind(data, clusters)

CPC(data[,c(1:2)], "kmeans", k = 2)
CPC(data, "manual", cols = 1:2, clusters = 3)


[Package CPC version 2.3.0 Index]