TEMM {TensorClustering} | R Documentation |
Fit the Tensor Envelope Mixture Model (TEMM)
Description
Fit the Tensor Envelope Mixture Model (TEMM)
Usage
TEMM(Xn, u, K, initial = "kmeans", iter.max = 500,
stop = 1e-3, trueY = NULL, print = FALSE)
Arguments
Xn |
The tensor for clustering, should be array type, the last dimension is the sample size |
u |
A vector of envelope dimension |
K |
Number of clusters, greater than or equal to |
initial |
Initialization meth0d for the regularized EM algorithm. Default value is "kmeans". |
iter.max |
Maximum number of iterations. Default value is |
stop |
Convergence threshold of relative change in cluster means. Default value is |
trueY |
A vector of true cluster labels of each observation. Default value is NULL. |
print |
Whether to print information including current iteration number, relative change in cluster means
and clustering error ( |
Details
The TEMM
function fits the Tensor Envelope Mixture Model (TEMM) through a subspace-regularized EM algorithm. For mode m
, let (\bm{\Gamma}_m,\bm{\Gamma}_{0m})\in R^{p_m\times p_m}
be an orthogonal matrix where \bm{\Gamma}_{m}\in R^{p_{m}\times u_{m}}
, u_{m}\leq p_{m}
, represents the material part. Specifically, the material part \mathbf{X}_{\star,m}=\mathbf{X}\times_{m}\bm{\Gamma}_{m}^{T}
follows a tensor normal mixture distribution, while the immaterial part \mathbf{X}_{\circ,m}=\mathbf{X}\times_{m}\bm{\Gamma}_{0m}^{T}
is unimodal, independent of the material part and hence can be eliminated without loss of clustering information. Dimension reduction is achieved by focusing on the material part \mathbf{X}_{\star,m}=\mathbf{X}\times_{m}\bm{\Gamma}_{m}^{T}
. Collectively, the joint reduction from each mode is
\mathbf{X}_{\star}=[\![\mathbf{X};\bm{\Gamma}_{1}^{T},\dots,\bm{\Gamma}_{M}^{T}]\!]\sim\sum_{k=1}^{K}\pi_{k}\mathrm{TN}(\bm{\alpha}_{k};\bm{\Omega}_{1},\dots,\bm{\Omega}_{M}),\quad \mathbf{X}_{\star}\perp\!\!\!\perp\mathbf{X}_{\circ,m},
where \bm{\alpha}_{k}\in R^{u_{1}\times\cdots\times u_{M}}
and \bm{\Omega}_m\in R^{u_m\times u_m}
are the dimension-reduced clustering parameters and \mathbf{X}_{\circ,m}
does not vary with cluster index Y
. In the E-step, the membership weights are evaluated as
\widehat{\eta}_{ik}^{(s)}=\frac{\widehat{\pi}_{k}^{(s-1)}f_{k}(\mathbf{X}_i;\widehat{\bm{\theta}}^{(s-1)})}{\sum_{k=1}^{K}\widehat{\pi}_{k}^{(s-1)}f_{k}(\mathbf{X}_i;\widehat{\bm{\theta}}^{(s-1)})},
where f_k
denotes the conditional probability density function of \mathbf{X}_i
within the k
-th cluster. In the subspace-regularized M-step, the envelope subspace is iteratively estimated through a Grassmann manifold optimization that minimize the following log-likelihood-based objective function:
G_m^{(s)}(\bm{\Gamma}_m) = \log|\bm{\Gamma}_m^T \mathbf{M}_m^{(s)} \bm{\Gamma}_m|+\log|\bm{\Gamma}_m^T (\mathbf{N}_m^{(s)})^{-1} \bm{\Gamma}_m|,
where \mathbf{M}_{m}^{(s)}
and \mathbf{N}_{m}^{(s)}
are given by
\mathbf{M}_m^{(s)} = \frac{1}{np_{-m}}\sum_{i=1}^{n} \sum_{k=1}^{K}\widehat{\eta}_{ik}^{(s)} (\bm{\epsilon}_{ik}^{(s)})_{(m)}(\widehat{\bm{\Sigma}}_{-m}^{(s-1)})^{-1} (\bm{\epsilon}_{ik}^{(s)})_{(m)}^T,
\mathbf{N}_m^{(s)} = \frac{1}{np_{-m}}\sum_{i=1}^{n} (\mathbf{X}_i)_{(m)}(\widehat{\bm{\Sigma}}_{-m}^{(s-1)})^{-1}(\mathbf{X}_i)_{(m)}^T.
The intermediate estimators \mathbf{M}_{m}^{(s)}
can be viewed the mode-m
conditional variation estimate of \mathbf{X}\mid Y
and \mathbf{N}_{m}^{(s)}
is the mode-m
marginal variation estimate of \mathbf{X}
.
Value
id |
A vector of estimated labels. |
pi |
A vector of estimated prior probabilities for clusters. |
eta |
A |
Mu.est |
A list of estimated cluster means. |
SIG.est |
A list of estimated covariance matrices. |
Mm |
Estimation of |
Nm |
Estimation of |
Gamma.est |
A list of estimated envelope basis. |
PGamma.est |
A list of envelope projection matrices. |
Author(s)
Kai Deng, Yuqing Pan, Xin Zhang and Qing Mai
References
Deng, K. and Zhang, X. (2021). Tensor Envelope Mixture Model for Simultaneous Clustering and Multiway Dimension Reduction. Biometrics.
See Also
TGMM
, tune_u_sep
, tune_u_joint
Examples
A = array(c(rep(1,20),rep(2,20))+rnorm(40),dim=c(2,2,10))
myfit = TEMM(A,u=c(2,2),K=2)