R: Cluster the Columns of Y into K Groups Using the NCut Graph...

ncut {NCutYX}

R Documentation

Cluster the Columns of Y into K Groups Using the NCut Graph Measure.

Description

Builds a similarity matrix for the columns of Y and clusters them into K groups based on the NCut graph measure. Correlation, Euclidean and Gaussian distances can be used to construct the similarity matrix.

Usage

ncut(Y, K = 2, B = 30, N = 500, dist = "correlation", scale = TRUE,
  q = 0.1, sigma = 1)

Arguments

`Y`	is a n x p matrix of p variables and n observations. The p columns of Y will be clustered into K groups using NCut.
`K`	is the number of clusters.
`B`	is the number of iterations.
`N`	is the number of samples per iterations.
`dist`	is the type of distance metric for the construction of the similarity matrix. Options are 'gaussian', 'euclidean' and 'correlation', the latter being the default.
`scale`	equals TRUE if data Y is to be scaled with mean 0 and variance 1.
`q`	is the quantile used for the top results at each iterations.
`sigma`	is the bandwidth parameter when the dist metric chosen is 'gaussian' (default=0.1).

Details

The algorithm minimizes the NCut through the cross entropy method. The edges of the graph correspond to the entries of a similarity matrix constructed based on a correlation, euclidean or gaussian distance metric. The clusters correspond to partitions that minimize this NCut objective function.

Value

A list with the following components:

quantile: a vector of length N which contains the quantiles q at each iteration of the optimization algorithm.
cluster: a matrix representing the clustering result of dimension p times K, where p is the number of columns of Y.
ncut: the NCut measure for the cluster result.

Author(s)

Sebastian Jose Teran Hidalgo. Maintainer: Sebastian Jose Teran Hidalgo sebastianteranhidalgo@gmail.com.

References

Von Luxburg, Ulrike. "A tutorial on spectral clustering." Statistics and computing 17.4 (2007): 395-416.

Kroese, D. P., Rubinstein, R. Y., Cohen, I., Porotsky, S., & Taimre, T. (2013). "Cross-entropy method." In Encyclopedia of Operations Research and Management Science (pp. 326-333). Springer US.

Examples

# This sets up the initial parameters for the simulation.
library(MASS)
n=100 # Sample size
B=30 # Number of iterations in the simulated annealing algorithm.
p=50 # Number of columns of Y.

S=matrix(0.2,p,p)
S[1:(p/2),(p/2+1):p]=0
S[(p/2+1):p,1:(p/2)]=0
S=S-diag(diag(S))+diag(p)
mu=rep(0,p)

W0=matrix(1,p,p)
W0[1:(p/2),1:(p/2)]=0
W0[(p/2+1):p,(p/2+1):p]=0
Denum=sum(W0)

Y=mvrnorm(n, mu, S)
# NCut
Res=ncut(Y,
K=2,
B=30,
N=1000,
dist='correlation',
scale=TRUE,
q=0.2,
sigma=0.1)
Cx=Res[[2]]
f11=matrix(Cx[,1],p,1)
f12=matrix(Cx[,2],p,1)

errorL=sum((f11%*%t(f11))*W0)/Denum+sum((f12%*%t(f12))*W0)/Denum
# This is the true error of the clustering solution.
errorL

[Package NCutYX version 0.1.0 Index]