awncut {NCutYX} | R Documentation |
Cluster the Rows of X into K Clusters Using the AWNCut Method.
Description
Builds similarity matrices for the rows of X and the rows of an assisted dataset Z. Clusters them into K groups while conducting feature selection based on the AWNCut method.
Usage
awncut(X, Z, K, lambda, Tau, B = 500, L = 1000)
Arguments
X |
is an n x p1 matrix of n observations and p1 variables. |
Z |
is an n x p2 matrix of n observations and p2 variables. Z is the assistant dataset. |
K |
is the number of clusters. |
lambda |
is a vector of tuning parameter lambda in the objective function. |
Tau |
is a vector of tuning parameters tau to be used in the objective function. |
B |
is the number of iterations in the simulated annealing algorithm. |
L |
is the temperature coefficient in the simulated annealing algorithm. |
Details
The algorithm maximizes a sum of the weighed NCut measure for X and assisted dataset Z, with the addition of a correlation measure between the two datasets. Feature selection is implemented by using the average correlation of each feature as a criterion.
Value
A list with the following components:
- lambda
the value of tuning parameter lambda for the result
- tau
the value of tuning parameter tau for the result
- Cs
a matrix of the clustering result
- ws
a vector of the feature selection result
- OP.value
the value of the objective function
Author(s)
Ruofan Bie. Maintainer: Sebastian Jose Teran Hidalgo sebastianteranhidalgo@gmail.com.
References
Li, Yang; Bie, Ruofan; Teran Hidalgo, Sebastian; Qin, Yinchen; Wu, Mengyun; Ma, Shuangge. Assisted gene expression-based clustering with AWNCut. (Submitted.)
Examples
set.seed(123456)
#This sets up the initial parameters for the simulation.
lambda <- seq(2,6,1) #Tuning parameter lambda
Tau <- seq(0.2,0.8,0.2) #Tuning parameter tau
n=30; n1=10; n2=10; n3=n-n1-n2 #Sample size
p1=10; p2=10; r1=8; r2=8; #Number of variables and noises in each dataset
K=3; #Number of clusters
mu=1; #Mean of the marginal distribution
u1=0.5; #Range of enties in the coefficient matrix
library(mvtnorm)
epsilon <- matrix(rnorm(n*(p1-r1),0,1), n, (p1-r1)) # Generation of random error
Sigma1 <- matrix(rep(0.8,(p1-r1)^2),(p1-r1),(p1-r1)) # Generation of the covariance matrix
diag(Sigma1) <- 1
# Generation of the original distribution of the three clusters
T1 <- matrix(rmvnorm(n1,mean=rep(-mu,(p1-r1)),sigma=Sigma1),n1,(p1-r1))
T2 <- matrix(rmvnorm(n2,mean=rep(0,(p1-r1)),sigma=Sigma1),n2,(p1-r1))
T3 <- matrix(rmvnorm(n3,mean=rep(mu,(p1-r1)),sigma=Sigma1),n3,(p1-r1))
X1 <- sign(T1)*(exp(abs(T1))) #Generation of signals in X
X2 <- sign(T2)*(exp(abs(T2)))
X3 <- sign(T3)*(exp(abs(T3)))
ep1 <- (matrix(rnorm(n*r1,0,1),n,r1)) #Generation of noises in X
X <- rbind(X1,X2,X3)
beta1 <- matrix(runif((p1-r1)*(p2-r2),-u1,u1),(p1-r1),(p2-r2)) #Generation of the coefficient matrix
Z <- X%*%beta1+epsilon #Generation of signals in Z
ep2 <- (matrix(rnorm(n*r2,0.5,1),n,r2)) #Generation of noises in Z
X <- cbind(X,ep1)
Z <- cbind(Z,ep2)
#our method
Tune1 <- awncut.selection(X, Z, K, lambda, Tau, B = 20, L = 1000)
awncut.result <- awncut(X, Z, 3, Tune1$lam, Tune1$tau, B = 20, L = 1000)
ErrorRate(awncut.result[[1]]$Cs, n1, n2)