R: MuNCut Clusters the Columns of Data from 3 Different Sources.

muncut {NCutYX}

R Documentation

MuNCut Clusters the Columns of Data from 3 Different Sources.

Description

It clusters the columns of Z,Y and X into K clusters by representing each data type as one network layer. It represents the Z layer depending on Y, and the Y layer depending on X. Elastic net can be used before the clustering procedure by using the predictions of Z and Y instead of the actual values to improve the cluster results. This function will output K clusters of columns of Z, Y and X.

Usage

muncut(Z, Y, X, K = 2, B = 3000, L = 1000, alpha = 0.5, ncv = 3,
  nlambdas = 100, scale = FALSE, model = FALSE, gamma = 0.5,
  sampling = "equal", dist = "gaussian", sigma = 0.1)

Arguments

`Z`	is a n x q matrix of q variables and n observations.
`Y`	is a n x p matrix of p variables and n observations.
`X`	is a n x r matrix of r variables and n observations.
`K`	is the number of column clusters.
`B`	is the number of iterations in the simulated annealing algorithm.
`L`	is the temperature coefficient in the simulated annealing algorithm.
`alpha`	is the tuning parameter in the elastic net penalty, only used when model=T.
`ncv`	is the number of cross-validations used to choose the tuning parameter lambda in the elastic net penalty, only used when model=T.
`nlambdas`	number of tuning parameters lambda used during cross-validation, only when model=T.
`scale`	when TRUE the Z, Y and X are scaled with mean 0 and standard deviation equal 1.
`model`	when TRUE the the relationship between Z and Y, and between Y and X are modeled with the elastic net. The predictions of Z, and Y from the models are used in the clustering algorithm.
`gamma`	is the tuning parameter of the clustering penalty. Larger values give more importance to within layer effects and less to across layer effects.
`sampling`	if 'equal' then the sampling distribution is discrete uniform over the number of clusters, if 'size' the probabilities are inversely proportional to the size of each cluster.
`dist`	is the type of distance measure use in the similarity matrix. Options are 'gaussian' and 'correlation', with 'gaussian' being the default.
`sigma`	is the bandwidth parameter when the dist metric chosen is gaussian.

Details

The algorithm minimizes a modified version of NCut through simulated annealing. The clusters correspond to partitions that minimize this objective function. The external information of X is incorporated by using ridge regression to predict Y.

References

Sebastian J. Teran Hidalgo and Shuangge Ma. Clustering Multilayer Omics Data using MuNCut. (Revise and resubmit.)

Examples

library(NCutYX)
library(MASS)
library(fields) #for image.plot

#parameters#
set.seed(777)
n=50
p=50
h=0.5
rho=0.5

W0=matrix(1,p,p)
W0[1:(p/5),1:(p/5)]=0
W0[(p/5+1):(3*p/5),(p/5+1):(3*p/5)]=0
W0[(3*p/5+1):(4*p/5),(3*p/5+1):(4*p/5)]=0
W0[(4*p/5+1):p,(4*p/5+1):p]=0
W0=cbind(W0,W0,W0)
W0=rbind(W0,W0,W0)

Y=matrix(0,n,p)
Z=matrix(0,n,p)
Sigma=matrix(rho,p,p)
Sigma[1:(p/5),1:(p/5)]=2*rho
Sigma[(p/5+1):(3*p/5),(p/5+1):(3*p/5)]=2*rho
Sigma[(3*p/5+1):(4*p/5),(3*p/5+1):(4*p/5)]=2*rho
Sigma=Sigma-diag(diag(Sigma))
Sigma=Sigma+diag(p)

X=mvrnorm(n,rep(0,p),Sigma)
B1=matrix(0,p,p)
B2=matrix(0,p,p)

B1[1:(p/5),1:(p/5)]=runif((p/5)^2,h/2,h)*rbinom((p/5)^2,1,0.2)
B1[(p/5+1):(3*p/5),(p/5+1):(3*p/5)]=runif((2*p/5)^2,h/2,h)*rbinom((2*p/5)^2,1,0.2)
B1[(3*p/5+1):(4*p/5),(3*p/5+1):(4*p/5)]=runif((p/5)^2,h/2,h)*rbinom((p/5)^2,1,0.2)

B2[1:(p/5),1:(p/5)]=runif((p/5)^2,h/2,h)*rbinom((p/5)^2,1,0.2)
B2[(p/5+1):(3*p/5),(p/5+1):(3*p/5)]=runif((2*p/5)^2,h/2,h)*rbinom((2*p/5)^2,1,0.2)
B2[(3*p/5+1):(4*p/5),(3*p/5+1):(4*p/5)]=runif((p/5)^2,h/2,h)*rbinom((p/5)^2,1,0.2)

Y=X%*%B1+matrix(rnorm(n*p,0,0.5),n,p)
Y2=X%*%B1

Z=Y%*%B2+matrix(rnorm(n*p,0,0.5),n,p)
Z2=Y%*%B2

#Computing our method
clust <- muncut(Z,
                Y,
                X,
                K        = 4,
                B        = 10000,
                L        = 500,
                sampling = 'size',
                alpha    = 0.5,
                ncv      = 3,
                nlambdas = 20,
                sigma    = 10,
                scale    = TRUE,
                model    = FALSE,
                gamma    = 0.1)

A <- clust[[2]][,1]%*%t(clust[[2]][,1])+
     clust[[2]][,2]%*%t(clust[[2]][,2])+
     clust[[2]][,3]%*%t(clust[[2]][,3])+
     clust[[2]][,4]%*%t(clust[[2]][,4])

errorK=sum(A*W0)/(3*p)^2
errorK
plot(clust[[1]],type='l')
image.plot(A)

[Package NCutYX version 0.1.0 Index]