DEEM {TensorClustering} | R Documentation |
Doubly-enhanced EM algorithm
Description
Doubly-enhanced EM algorithm for tensor clustering
Usage
DEEM(X, nclass, niter = 100, lambda = NULL, dfmax = n, pmax = nvars, pf = rep(1, nvars),
eps = 1e-04, maxit = 1e+05, sml = 1e-06, verbose = FALSE, ceps = 0.1,
initial = TRUE, vec_x = NULL)
Arguments
X |
Input tensor (or matrix) list of length |
nclass |
Number of clusters. |
niter |
Maximum iteration times for EM algorithm. Default value is 100. |
lambda |
A user-specified |
dfmax |
The maximum number of selected variables in the model. Default is the number of observations |
pmax |
The maximum number of potential selected variables during iteration. In middle step, the algorithm can select at most |
pf |
Weight of lasso penalty. Default is a vector of value |
eps |
Convergence threshold for coordinate descent difference between iterations. Default value is |
maxit |
Maximum iteration times for coordinate descent for all lambda. Default value is |
sml |
Threshold for ratio of loss function change after each iteration to old loss function value. Default value is |
verbose |
Indicates whether print out lambda during iteration or not. Default value is |
ceps |
Convergence threshold for cluster mean difference between iterations. Default value is |
initial |
Whether to initialize algorithm with K-means clustering. Default value is |
vec_x |
Vectorized tensor data. Default value is |
Details
The DEEM
function implements the Doubly-Enhanced EM algorithm (DEEM) for tensor clustering. The observations \mathbf{X}_i
are assumed to be following the tensor normal mixture model (TNMM) with common covariances across different clusters:
\mathbf{X}_i\sim\sum_{k=1}^K\pi_k \mathrm{TN}(\bm{\mu}_k;\bm{\Sigma}_1,\ldots,\bm{\Sigma}_M),\quad i=1,\dots,n,
where 0<\pi_k<1
is the prior probability for \mathbf{X}
to be in the k
-th cluster such that \sum_{k=1}^{K}\pi_k=1
, \bm{\mu}_k
is the cluster mean of the k
-th cluster and \bm{\Sigma}_1,\ldots,\bm{\Sigma}_M)
are the common covariances across different clusters. Under the TNMM framework, the optimal clustering rule can be showed as
\widehat{Y}^{opt}=\arg\max_k\{\log\pi_k+\langle\mathbf{X}-(\bm{\mu}_1+\bm{\mu}_k)/2,\mathbf{B}_k\rangle\},
where \mathbf{B}_k=[\![\bm{\mu}_k-\bm{\mu}_1;\bm{\Sigma}_1^{-1},\ldots,\bm{\Sigma}_M^{-1}]\!]
. In the enhanced E-step, DEEM
imposes sparsity directly on the optimal clustering rule as a flexible alternative to popular low-rank assumptions on tensor coefficients \mathbf{B}_k
as
\min_{\mathbf{B}_2,\dots,\mathbf{B}_K}\bigg[\sum_{k=2}^K(\langle\mathbf{B}_k,[\![\mathbf{B}_k,\widehat{\bm{\Sigma}}_1^{(t)},\ldots,\widehat{\bm{\Sigma}}_M^{(t)}]\!]\rangle-2\langle\mathbf{B}_k,\widehat{\bm{\mu}}_k^{(t)}-\widehat{\bm{\mu}}_1^{(t)}\rangle) +\lambda^{(t+1)}\sum_{\mathcal{J}}\sqrt{\sum_{k=2}^Kb_{k,\mathcal{J}}^2}\bigg],
where \lambda^{(t+1)}
is a tuning parameter. In the enhanced M-step, DEEM
employs a new estimator for the tensor correlation structure, which facilitates both the computation and the theoretical studies.
Value
pi |
A vector of estimated prior probabilities for clusters. |
mu |
A list of estimated cluster means. |
sigma |
A list of estimated covariance matrices. |
gamma |
A |
y |
A vector of estimated labels. |
iter |
Number of iterations until convergence. |
df |
Average zero elements in beta over iterations. |
beta |
A matrix of vectorized |
Author(s)
Kai Deng, Yuqing Pan, Xin Zhang and Qing Mai
References
Mai, Q., Zhang, X., Pan, Y. and Deng, K. (2021). A Doubly-Enhanced EM Algorithm for Model-Based Tensor Clustering. Journal of the American Statistical Association.
See Also
Examples
dimen = c(5,5,5)
nvars = prod(dimen)
K = 2
n = 100
sigma = array(list(),3)
sigma[[1]] = sigma[[2]] = sigma[[3]] = diag(5)
B2=array(0,dim=dimen)
B2[1:3,1,1]=2
y = c(rep(1,50),rep(2,50))
M = array(list(),K)
M[[1]] = array(0,dim=dimen)
M[[2]] = B2
vec_x=matrix(rnorm(n*prod(dimen)),ncol=n)
X=array(list(),n)
for (i in 1:n){
X[[i]] = array(vec_x[,i],dim=dimen)
X[[i]] = M[[y[i]]] + X[[i]]
}
myfit = DEEM(X, nclass=2, lambda=0.05)