CPC {CPC}R Documentation

Cluster-Polarization Coefficient

Description

Implements clustering algorithms and calculates cluster-polarization coefficient. Contains support for hierarchical clustering, k-means clustering, partitioning around medoids, density-based spatial clustering with noise, and manual assignment of cluster membership.

Usage

CPC(
  data,
  type,
  k = NULL,
  epsilon = NULL,
  model = FALSE,
  adjust = FALSE,
  cols = NULL,
  clusters = NULL,
  ...
)

Arguments

data

a numeric vector or n x k matrix or data frame. If type = "manual", data must be a matrix containing a vector identifying cluster membership for each observation, to be passed to clusters argument.

type

a character string giving the type of clustering method to be used. See Details.

k

the desired number of clusters. Required if type is one of "hclust", "diana", "kmeans", or "pam".

epsilon

radius of epsilon neighborhood. Required if type = "dbscan".

model

a logical indicating whether clustering model output should be returned. Defaults to FALSE.

adjust

a logical indicating whether the adjusted CPC should be calculated. Defaults to FALSE. Note that both CPC and adjusted CPC are automatically calculated and returned if model = TRUE.

cols

columns of data to be used in CPC calculation. Only used if type = "manual".

clusters

column of data indicating cluster membership for each observation. Only used if type = "manual".

...

arguments passed to other functions.

Details

type must take one of six values:
"hclust": agglomerative hierarchical clustering with hclust(),
"diana": divisive hierarchical clustering with diana(),
"kmeans": k-means clustering with kmeans(),
"pam": k-medoids clustering with pam(),
"dbscan": density-based clustering with dbscan(),
"manual": no clustering is necessary, researcher has specified cluster assignments.

For all clustering methods, additional arguments to fine-tune clustering performance, such as the specific algorithm to be used, should be passed to CPC() and will be inherited by the specified clustering function. In particular, if type = "kmeans", using a large number of random starts is recommended. This can be specified with the nstart argument to kmeans(), passed directly to CPC().

If type = "manual", data must contain a vector identifying cluster membership for each observation, and cols and clusters must be defined.

Value

If model = TRUE, CPC() returns a list with components containing output from the specified clustering function, all sums of squares, the CPC, the adjusted CPC, and associated standard errors. If model = FALSE, CPC() returns a numeric vector of length 1 giving the CPC (if adjust = FALSE) or adjusted CPC (if adjust = TRUE).

Examples

data <- matrix(c(rnorm(50, 0, 1), rnorm(50, 5, 1)), ncol = 2, byrow = TRUE)
clusters <- matrix(c(rep(1, 25), rep(2, 25)), ncol = 1)
data <- cbind(data, clusters)

CPC(data[,c(1:2)], "kmeans", k = 2)
CPC(data, "manual", cols = 1:2, clusters = 3)


[Package CPC version 2.6.0 Index]