ancut {NCutYX}R Documentation

Cluster the Columns of Y into K Groups with the Help of External Features X.

Description

This function will output K clusters of the columns of Y using the help of X.

Usage

ancut(Y, X, K = 2, B = 3000, L = 1000, alpha = 0.5, nlambdas = 100,
  sampling = "equal", ncv = 5, dist = "correlation", sigma = 0.1)

Arguments

Y

is a n x p matrix of p variables and n observations. The columns of Y will be clustered into K groups.

X

is a n x q matrix of q variables and n observations.

K

is the number of clusters.

B

is the number of iterations in the simulated annealing algorithm.

L

is the temperature coefficient in the simulated annealing algorithm.

alpha

is the coefficient of the elastic net penalty.

nlambdas

is the number of tuning parameters in the elastic net.

sampling

if 'equal' then the sampling probabilities is the same during the simulated annealing algorithm, if 'size' the probabilites are proportional the the sizes of the clusters in the current iterations.

ncv

is the number of cross-validations in the elastic net.

dist

is the type of distance metric for the construction of the similarity matrix. Options are 'gaussian', 'euclidean' and 'correlation', the latter being the default.

sigma

is the parameter for the gaussian kernel distance which is ignored if 'gaussian' is not chosen as distance measure.

Details

The algorithm minimizes a modified version of NCut through simulated annealing. The modified NCut uses in the numerator the similarity matrix of the original data Y and the denominator uses the similarity matrix of the prediction of Y using X. The clusters correspond to partitions that minimize this objective function. The external information of X is incorporated by using elastic net to predict Y.

Value

A list with the final value of the objective function, the clusters and the lambda penalty chosen through cross-validation.

A list with the following components:

loss

a vector of length N which contains the loss at each iteration of the simulated annealing algorithm.

cluster

a matrix representing the clustering result of dimension p times K, where p is the number of columns of Y.

lambda.min

is the optimal lambda chosen through cross-validation for the elastic net for predicting Y with Y.

Author(s)

Sebastian Jose Teran Hidalgo and Shuangge Ma. Maintainer: Sebastian Jose Teran Hidalgo. sebastianteranhidalgo@gmail.com.

References

Hidalgo, Sebastian J. Teran, Mengyun Wu, and Shuangge Ma. Assisted clustering of gene expression data using ANCut. BMC genomics 18.1 (2017): 623.

Examples

#This sets up the initial parameters for the simulation.
library(MASS)#for mvrnorm
library(fields)
n=30 #Sample size
B=50 #Number of iterations in the simulated annealing algorithm.
L=10000 #Temperature coefficient.
p=50 #Number of columns of Y.
q=p #Number of columns of X.
h1=0.15
h2=0.25

S=matrix(0.2,q,q)
S[1:(q/2),(q/2+1):q]=0
S[(q/2+1):q,1:(q/2)]=0
S=S-diag(diag(S))+diag(q)

mu=rep(0,q)

W0=matrix(1,p,p)
W0[1:(p/2),1:(p/2)]=0
W0[(p/2+1):p,(p/2+1):p]=0
Denum=sum(W0)

B2=matrix(0,q,p)
for (i in 1:(p/2)){
   B2[1:(q/2),i]=runif(q/2,h1,h2)
   in1=sample.int(q/2,6)
   B2[-in1,i]=0
}

for (i in (p/2+1):p){
   B2[(q/2+1):q,i]=runif(q/2,h1,h2)
   in2=sample(seq(q/2+1,q),6)
   B2[-in2,i]=0
}

X=mvrnorm(n, mu, S)
Z=X%*%B2
Y=Z+matrix(rnorm(n*p,0,1),n,p)
#Our method
Res=ancut(Y=Y,X=X,B=B,L=L,alpha=0,ncv=3)
Cx=Res[[2]]
f11=matrix(Cx[,1],p,1)
f12=matrix(Cx[,2],p,1)

errorL=sum((f11%*%t(f11))*W0)/Denum+sum((f12%*%t(f12))*W0)/Denum
#This is the true error of the clustering solution.
errorL

par(mfrow=c(1,2))
#Below is a plot of the simulated annealing path.
plot(Res[[1]],type='l',ylab='')
#Cluster found by ANCut
image.plot(Cx)

[Package NCutYX version 0.1.0 Index]