TEMM {TensorClustering}R Documentation

Fit the Tensor Envelope Mixture Model (TEMM)

Description

Fit the Tensor Envelope Mixture Model (TEMM)

Usage

TEMM(Xn, u, K, initial = "kmeans", iter.max = 500, 
stop = 1e-3, trueY = NULL, print = FALSE)

Arguments

Xn

The tensor for clustering, should be array type, the last dimension is the sample size n.

u

A vector of envelope dimension

K

Number of clusters, greater than or equal to 2.

initial

Initialization meth0d for the regularized EM algorithm. Default value is "kmeans".

iter.max

Maximum number of iterations. Default value is 500.

stop

Convergence threshold of relative change in cluster means. Default value is 1e-3.

trueY

A vector of true cluster labels of each observation. Default value is NULL.

print

Whether to print information including current iteration number, relative change in cluster means and clustering error (%) in each iteration.

Details

The TEMM function fits the Tensor Envelope Mixture Model (TEMM) through a subspace-regularized EM algorithm. For mode mm, let (Γm,Γ0m)Rpm×pm(\bm{\Gamma}_m,\bm{\Gamma}_{0m})\in R^{p_m\times p_m} be an orthogonal matrix where ΓmRpm×um\bm{\Gamma}_{m}\in R^{p_{m}\times u_{m}}, umpmu_{m}\leq p_{m}, represents the material part. Specifically, the material part X,m=X×mΓmT\mathbf{X}_{\star,m}=\mathbf{X}\times_{m}\bm{\Gamma}_{m}^{T} follows a tensor normal mixture distribution, while the immaterial part X,m=X×mΓ0mT\mathbf{X}_{\circ,m}=\mathbf{X}\times_{m}\bm{\Gamma}_{0m}^{T} is unimodal, independent of the material part and hence can be eliminated without loss of clustering information. Dimension reduction is achieved by focusing on the material part X,m=X×mΓmT\mathbf{X}_{\star,m}=\mathbf{X}\times_{m}\bm{\Gamma}_{m}^{T}. Collectively, the joint reduction from each mode is

X=[ ⁣[X;Γ1T,,ΓMT] ⁣]k=1KπkTN(αk;Ω1,,ΩM),X ⁣ ⁣ ⁣X,m, \mathbf{X}_{\star}=[\![\mathbf{X};\bm{\Gamma}_{1}^{T},\dots,\bm{\Gamma}_{M}^{T}]\!]\sim\sum_{k=1}^{K}\pi_{k}\mathrm{TN}(\bm{\alpha}_{k};\bm{\Omega}_{1},\dots,\bm{\Omega}_{M}),\quad \mathbf{X}_{\star}\perp\!\!\!\perp\mathbf{X}_{\circ,m},

where αkRu1××uM\bm{\alpha}_{k}\in R^{u_{1}\times\cdots\times u_{M}} and ΩmRum×um\bm{\Omega}_m\in R^{u_m\times u_m} are the dimension-reduced clustering parameters and X,m\mathbf{X}_{\circ,m} does not vary with cluster index YY. In the E-step, the membership weights are evaluated as

η^ik(s)=π^k(s1)fk(Xi;θ^(s1))k=1Kπ^k(s1)fk(Xi;θ^(s1)), \widehat{\eta}_{ik}^{(s)}=\frac{\widehat{\pi}_{k}^{(s-1)}f_{k}(\mathbf{X}_i;\widehat{\bm{\theta}}^{(s-1)})}{\sum_{k=1}^{K}\widehat{\pi}_{k}^{(s-1)}f_{k}(\mathbf{X}_i;\widehat{\bm{\theta}}^{(s-1)})},

where fkf_k denotes the conditional probability density function of Xi\mathbf{X}_i within the kk-th cluster. In the subspace-regularized M-step, the envelope subspace is iteratively estimated through a Grassmann manifold optimization that minimize the following log-likelihood-based objective function:

Gm(s)(Γm)=logΓmTMm(s)Γm+logΓmT(Nm(s))1Γm, G_m^{(s)}(\bm{\Gamma}_m) = \log|\bm{\Gamma}_m^T \mathbf{M}_m^{(s)} \bm{\Gamma}_m|+\log|\bm{\Gamma}_m^T (\mathbf{N}_m^{(s)})^{-1} \bm{\Gamma}_m|,

where Mm(s)\mathbf{M}_{m}^{(s)} and Nm(s)\mathbf{N}_{m}^{(s)} are given by

Mm(s)=1npmi=1nk=1Kη^ik(s)(ϵik(s))(m)(Σ^m(s1))1(ϵik(s))(m)T, \mathbf{M}_m^{(s)} = \frac{1}{np_{-m}}\sum_{i=1}^{n} \sum_{k=1}^{K}\widehat{\eta}_{ik}^{(s)} (\bm{\epsilon}_{ik}^{(s)})_{(m)}(\widehat{\bm{\Sigma}}_{-m}^{(s-1)})^{-1} (\bm{\epsilon}_{ik}^{(s)})_{(m)}^T,

Nm(s)=1npmi=1n(Xi)(m)(Σ^m(s1))1(Xi)(m)T. \mathbf{N}_m^{(s)} = \frac{1}{np_{-m}}\sum_{i=1}^{n} (\mathbf{X}_i)_{(m)}(\widehat{\bm{\Sigma}}_{-m}^{(s-1)})^{-1}(\mathbf{X}_i)_{(m)}^T.

The intermediate estimators Mm(s)\mathbf{M}_{m}^{(s)} can be viewed the mode-mm conditional variation estimate of XY\mathbf{X}\mid Y and Nm(s)\mathbf{N}_{m}^{(s)} is the mode-mm marginal variation estimate of X\mathbf{X}.

Value

id

A vector of estimated labels.

pi

A vector of estimated prior probabilities for clusters.

eta

A n by K matrix of estimated membership weights.

Mu.est

A list of estimated cluster means.

SIG.est

A list of estimated covariance matrices.

Mm

Estimation of Mm defined in paper.

Nm

Estimation of Nm defined in paper.

Gamma.est

A list of estimated envelope basis.

PGamma.est

A list of envelope projection matrices.

Author(s)

Kai Deng, Yuqing Pan, Xin Zhang and Qing Mai

References

Deng, K. and Zhang, X. (2021). Tensor Envelope Mixture Model for Simultaneous Clustering and Multiway Dimension Reduction. Biometrics.

See Also

TGMM, tune_u_sep, tune_u_joint

Examples

  A = array(c(rep(1,20),rep(2,20))+rnorm(40),dim=c(2,2,10))
  myfit = TEMM(A,u=c(2,2),K=2)

[Package TensorClustering version 1.0.2 Index]