muncut {NCutYX} | R Documentation |
MuNCut Clusters the Columns of Data from 3 Different Sources.
Description
It clusters the columns of Z,Y and X into K clusters by representing each data type as one network layer. It represents the Z layer depending on Y, and the Y layer depending on X. Elastic net can be used before the clustering procedure by using the predictions of Z and Y instead of the actual values to improve the cluster results. This function will output K clusters of columns of Z, Y and X.
Usage
muncut(Z, Y, X, K = 2, B = 3000, L = 1000, alpha = 0.5, ncv = 3,
nlambdas = 100, scale = FALSE, model = FALSE, gamma = 0.5,
sampling = "equal", dist = "gaussian", sigma = 0.1)
Arguments
Z |
is a n x q matrix of q variables and n observations. |
Y |
is a n x p matrix of p variables and n observations. |
X |
is a n x r matrix of r variables and n observations. |
K |
is the number of column clusters. |
B |
is the number of iterations in the simulated annealing algorithm. |
L |
is the temperature coefficient in the simulated annealing algorithm. |
alpha |
is the tuning parameter in the elastic net penalty, only used when model=T. |
ncv |
is the number of cross-validations used to choose the tuning parameter lambda in the elastic net penalty, only used when model=T. |
nlambdas |
number of tuning parameters lambda used during cross-validation, only when model=T. |
scale |
when TRUE the Z, Y and X are scaled with mean 0 and standard deviation equal 1. |
model |
when TRUE the the relationship between Z and Y, and between Y and X are modeled with the elastic net. The predictions of Z, and Y from the models are used in the clustering algorithm. |
gamma |
is the tuning parameter of the clustering penalty. Larger values give more importance to within layer effects and less to across layer effects. |
sampling |
if 'equal' then the sampling distribution is discrete uniform over the number of clusters, if 'size' the probabilities are inversely proportional to the size of each cluster. |
dist |
is the type of distance measure use in the similarity matrix. Options are 'gaussian' and 'correlation', with 'gaussian' being the default. |
sigma |
is the bandwidth parameter when the dist metric chosen is gaussian. |
Details
The algorithm minimizes a modified version of NCut through simulated annealing. The clusters correspond to partitions that minimize this objective function. The external information of X is incorporated by using ridge regression to predict Y.
References
Sebastian J. Teran Hidalgo and Shuangge Ma. Clustering Multilayer Omics Data using MuNCut. (Revise and resubmit.)
Examples
library(NCutYX)
library(MASS)
library(fields) #for image.plot
#parameters#
set.seed(777)
n=50
p=50
h=0.5
rho=0.5
W0=matrix(1,p,p)
W0[1:(p/5),1:(p/5)]=0
W0[(p/5+1):(3*p/5),(p/5+1):(3*p/5)]=0
W0[(3*p/5+1):(4*p/5),(3*p/5+1):(4*p/5)]=0
W0[(4*p/5+1):p,(4*p/5+1):p]=0
W0=cbind(W0,W0,W0)
W0=rbind(W0,W0,W0)
Y=matrix(0,n,p)
Z=matrix(0,n,p)
Sigma=matrix(rho,p,p)
Sigma[1:(p/5),1:(p/5)]=2*rho
Sigma[(p/5+1):(3*p/5),(p/5+1):(3*p/5)]=2*rho
Sigma[(3*p/5+1):(4*p/5),(3*p/5+1):(4*p/5)]=2*rho
Sigma=Sigma-diag(diag(Sigma))
Sigma=Sigma+diag(p)
X=mvrnorm(n,rep(0,p),Sigma)
B1=matrix(0,p,p)
B2=matrix(0,p,p)
B1[1:(p/5),1:(p/5)]=runif((p/5)^2,h/2,h)*rbinom((p/5)^2,1,0.2)
B1[(p/5+1):(3*p/5),(p/5+1):(3*p/5)]=runif((2*p/5)^2,h/2,h)*rbinom((2*p/5)^2,1,0.2)
B1[(3*p/5+1):(4*p/5),(3*p/5+1):(4*p/5)]=runif((p/5)^2,h/2,h)*rbinom((p/5)^2,1,0.2)
B2[1:(p/5),1:(p/5)]=runif((p/5)^2,h/2,h)*rbinom((p/5)^2,1,0.2)
B2[(p/5+1):(3*p/5),(p/5+1):(3*p/5)]=runif((2*p/5)^2,h/2,h)*rbinom((2*p/5)^2,1,0.2)
B2[(3*p/5+1):(4*p/5),(3*p/5+1):(4*p/5)]=runif((p/5)^2,h/2,h)*rbinom((p/5)^2,1,0.2)
Y=X%*%B1+matrix(rnorm(n*p,0,0.5),n,p)
Y2=X%*%B1
Z=Y%*%B2+matrix(rnorm(n*p,0,0.5),n,p)
Z2=Y%*%B2
#Computing our method
clust <- muncut(Z,
Y,
X,
K = 4,
B = 10000,
L = 500,
sampling = 'size',
alpha = 0.5,
ncv = 3,
nlambdas = 20,
sigma = 10,
scale = TRUE,
model = FALSE,
gamma = 0.1)
A <- clust[[2]][,1]%*%t(clust[[2]][,1])+
clust[[2]][,2]%*%t(clust[[2]][,2])+
clust[[2]][,3]%*%t(clust[[2]][,3])+
clust[[2]][,4]%*%t(clust[[2]][,4])
errorK=sum(A*W0)/(3*p)^2
errorK
plot(clust[[1]],type='l')
image.plot(A)