R: ICM-EM algorithm implementation

ICM.EM {PRECAST}

R Documentation

ICM-EM algorithm implementation

Description

ICM-EM algorithm for fitting PRECAST model

Usage

  ICM.EM(XList, q, K, AdjList=NULL,  Adjlist_car=NULL, posList = NULL, 
      platform = "ST", beta_grid=seq(0.2,4, by=0.2),maxIter_ICM=6,
      maxIter=20, epsLogLik=1e-5, verbose=TRUE,mix_prop_heter=TRUE, 
      Sigma_equal=FALSE, Sigma_diag=TRUE,error_heter=TRUE, Sp2=TRUE,
      wpca_int=FALSE, int.model='EEE', seed=1,coreNum = 1, coreNum_int=coreNum)

Arguments

`XList`	an M-length list consisting of multiple matrices with class `dgCMatrix` or `matrix` that specify the log-normalization gene expression matrix for each data sample used for PRECAST model.
`q`	a positive integer, specify the number of latent features to be extracted, default as 15.
`K`	a positive integer allowing scalar or vector, specify the number of clusters in model fitting.
`AdjList`	an M-length list of sparse matrices with class `dgCMatrix`, specify the adjacency matrix used for Potts model in PRECAST. We provide this interface for those users who would like to define the adjacency matrix by their own.
`Adjlist_car`	an M-length list of sparse matrices with class `dgCMatrix`, specify the adjacency matrix used for CAR model in PRECAST, default as AdjList in the Potts model. We provide this interface for those users who would like to use the different adjacency matrix in CAR model.
`posList`	an M-length list composed by spatial coordinate matrix for each data sample.
`platform`	a string, specify the platform of the provided data, default as "Visium". There are many platforms to be supported, including ("Visuim", "ST", "SeqFISH", 'merFISH', 'slide-seqv2', 'seqscope', "HDST"). If AdjList is not given, the The platform helps to calculate the adjacency matrix by defining the neighbors.
`beta_grid`	an optional vector of positive value, the candidate set of the smoothing parameter to be searched by the grid-search optimization approach.
`maxIter_ICM`	an optional positive value, represents the maximum iterations of ICM.
`maxIter`	an optional positive value, represents the maximum iterations of EM.
`epsLogLik`	an optional positive vlaue, tolerance vlaue of relative variation rate of the observed pseudo log-loglikelihood value, defualt as '1e-5'.
`verbose`	an optional logical value, whether output the information of the ICM-EM algorithm.
`mix_prop_heter`	an optional logical value, specify whether betar are distict, default as `TRUE`.
`Sigma_equal`	an optional logical value, specify whether Sigmaks are equal, default as FALSE.
`Sigma_diag`	an optional logical value, specify whether Sigmaks are diagonal matrices, default as `TRUE`.
`error_heter`	an optional logical value, whether use the heterogenous error for DR-SC model, default as `TRUE`. If `error_heter=FALSE`, then the homogenuous error is used for probabilistic PCA model in PRECAST.
`Sp2`	an optional logical value, whether add the ICAR model component in the model, default as TRUE. We provide this interface for those users who don't want to include the ICAR model.
`wpca_int`	an optional logical value, means whether use the weighted PCA to obtain the initial values of loadings and other paramters, default as `FALSE` which means the ordinary PCA is used.
`int.model`	an optional string, specify which Gaussian mixture model is used in evaluting the initial values for PRECAST, default as "EEE"; and see `Mclust` for more models' names.
`seed`	an optional integer, the random seed in fitting PRECAST model.
`coreNum`	an optional positive integer, means the number of thread used in parallel computating.
`coreNum_int`	an optional positive integer, means the number of cores used in parallel computation for initial values when `K` is a vector, default as same as `coreNum`.

Details

Nothing

Value

ICM.EM returns a list with class "SeqKiDRSC_Object" with the number of components equal to the length of K, where each component includes the model fitting results for one number of cluster and is a list consisting of following components:

`cluster`	an M-length list that includes the inferred class labels for each data sample.
`hZ`	an M-length list that includes the batch corrected low-dimensional embeddings for each data sample.
`hV`	an M-length list that includes the estimate the ICAR component for each sample.
`Rf`	an M-length list that includes the posterior probability of domain clusters for each sample.
`beta`	an M-length vector that includes the estimated smoothing parameters for each sample.
`Mu`	mean vectors of mixtures components.
`Sigma`	covariance matrix of mixtures components.
`W`	estimated loading matrix
`Lam`	estimated variance of errors in probabilistic PCA model
`loglik`	pseudo observed log-likelihood.

Note

nothing

Author(s)

Wei Liu

References

Wei Liu, Liao, X., Luo, Z. et al, Jin Liu* (2023). Probabilistic embedding, clustering, and alignment for integrating spatial transcriptomics data with PRECAST. Nature Communications, 14, 296

Examples


  ## we generate the spatial transcriptomics data with lattice neighborhood, i.e. ST platform.
  library(Matrix)
  q <- 10; K <- 4
  data(PRECASTObj)
  posList <- lapply(PRECASTObj@seulist, function(x) cbind(x$row, x$col))
  AdjList <- lapply(posList, getAdj_reg, platform='ST')
  XList <- lapply(PRECASTObj@seulist, function(x) t(x[['RNA']]@data))
  XList <- lapply(XList, scale, scale=FALSE)
  ## For illustration, maxIter is set to 4
  resList <- ICM.EM(XList,AdjList = AdjList, maxIter=4,
                   q=q, K=K, verbose=TRUE)

[Package PRECAST version 1.6.5 Index]