R: Cluster-Polarization Coefficient

CPC {CPC}

R Documentation

Cluster-Polarization Coefficient

Description

Implements clustering algorithms and calculates cluster-polarization coefficient. Contains support for hierarchical clustering, k-means clustering, partitioning around medoids, density-based spatial clustering with noise, and manual assignment of cluster membership.

Usage

CPC(
  data,
  type,
  k = NULL,
  epsilon = NULL,
  model = FALSE,
  adjust = FALSE,
  cols = NULL,
  clusters = NULL,
  ...
)

Arguments

`data`	a numeric vector or `n x k` matrix or data frame. If `type = "manual"`, `data` must be a matrix containing a vector identifying cluster membership for each observation, to be passed to `clusters` argument.
`type`	a character string giving the type of clustering method to be used. See Details.
`k`	the desired number of clusters. Required if `type` is one of `"hclust"`, `"diana"`, `"kmeans"`, or `"pam"`.
`epsilon`	radius of epsilon neighborhood. Required if `type = "dbscan"`.
`model`	a logical indicating whether clustering model output should be returned. Defaults to `FALSE`.
`adjust`	a logical indicating whether the adjusted CPC should be calculated. Defaults to `FALSE`. Note that both CPC and adjusted CPC are automatically calculated and returned if `model = TRUE`.
`cols`	columns of `data` to be used in CPC calculation. Only used if `type = "manual"`.
`clusters`	column of `data` indicating cluster membership for each observation. Only used if `type = "manual"`.
`...`	arguments passed to other functions.

Details

type must take one of six values:
"hclust": agglomerative hierarchical clustering with hclust(),
"diana": divisive hierarchical clustering with diana(),
"kmeans": k-means clustering with kmeans(),
"pam": k-medoids clustering with pam(),
"dbscan": density-based clustering with dbscan(),
"manual": no clustering is necessary, researcher has specified cluster assignments.

For all clustering methods, additional arguments to fine-tune clustering performance, such as the specific algorithm to be used, should be passed to CPC() and will be inherited by the specified clustering function. In particular, if type = "kmeans", using a large number of random starts is recommended. This can be specified with the nstart argument to kmeans(), passed directly to CPC().

If type = "manual", data must contain a vector identifying cluster membership for each observation, and cols and clusters must be defined.

Value

If model = TRUE, CPC() returns a list with components containing output from the specified clustering function, all sums of squares, the CPC, the adjusted CPC, and associated standard errors. If model = FALSE, CPC() returns a numeric vector of length 1 giving the CPC (if adjust = FALSE) or adjusted CPC (if adjust = TRUE).

Examples

data <- matrix(c(rnorm(50, 0, 1), rnorm(50, 5, 1)), ncol = 2, byrow = TRUE)
clusters <- matrix(c(rep(1, 25), rep(2, 25)), ncol = 1)
data <- cbind(data, clusters)

CPC(data[,c(1:2)], "kmeans", k = 2)
CPC(data, "manual", cols = 1:2, clusters = 3)

[Package CPC version 2.6.0 Index]