R: Joint dimension reduction and spatial clustering

DR.SC_fit {DR.SC}

R Documentation

Joint dimension reduction and spatial clustering

Description

Joint dimension reduction and spatial clustering for scRNA-seq and spatial transcriptomics data

Usage

  DR.SC_fit(X, K, Adj_sp=NULL, q=15,
             error.heter= TRUE, beta_grid=seq(0.5, 5, by=0.5),
             maxIter=25, epsLogLik=1e-5, verbose=FALSE, maxIter_ICM=6,
             wpca.int=FALSE, int.model="EEE", approxPCA=FALSE, coreNum = 5)

Arguments

`X`	a sparse matrix with class `dgCMatrix` or `matrix`, specify the log-normalization gene expression matrix used for DR-SC model.
`K`	a positive integer allowing scalar or vector, specify the number of clusters in model fitting.
`Adj_sp`	an optional sparse matrix with class `dgCMatrix`, specify the adjoint matrix used for DR-SC model. We provide this interface for those users who would like to define the adjacency matrix by their own.
`q`	a positive integer, specify the number of latent features to be extracted, default as 15. Usually, the choice of q is a trade-off between model complexity and fit to the data, and depends on the goals of the analysis and the structure of the data. A higher value will result in a more complex model with a higher number of parameters, which may lead to overfitting and poor generalization performance. On the other hand, a lower value will result in a simpler model with fewer parameters, but may also lead to underfitting and a poorer fit to the data.
`error.heter`	an optional logical value, whether use the heterogenous error for DR-SC model, default as `TRUE`. If `error.heter=FALSE`, then the homogenuous error is used for probabilistic PCA model in DR-SC.
`beta_grid`	an optional vector of positive value, the candidate set of the smoothing parameter to be searched by the grid-search optimization approach.
`maxIter`	an optional positive value, represents the maximum iterations of EM.
`epsLogLik`	an optional positive vlaue, tolerance vlaue of relative variation rate of the observed pseudo log-loglikelihood value, defualt as '1e-5'.
`verbose`	an optional logical value, whether output the information of the ICM-EM algorithm.
`maxIter_ICM`	an optional positive value, represents the maximum iterations of ICM.
`wpca.int`	an optional logical value, means whether use the weighted PCA to obtain the initial values of loadings and other paramters, default as `FALSE` which means the ordinary PCA is used.
`int.model`	an optional string, specify which Gaussian mixture model is used in evaluting the initial values for DR-SC, default as "EEE"; and see Mclust for more models' names.
`approxPCA`	an optional logical value, whether use approximated PCA to speed up the computation for initial values.
`coreNum`	an optional positive integer, means the number of thread used in parallel computating, default as 5. If the length of K is one, then coreNum will be set as 1 automatically.

Details

Nothing

Value

DR.SC_fit returns a list with class "drscObject" with the following three components:

`Objdrsc`	a list including the model fitting results, in which the number of elements is same as the length of K.
`out_param`	a numeric matrix used for model selection in MBIC.
`K_set`	a scalar or vector equal to input argument K.

In addition, each element of "Objdrsc" is a list with the following comoponents:

`cluster`	inferred class labels
`hZ`	extracted latent features.
`beta`	estimated smoothing parameter
`Mu`	mean vectors of mixtures components.
`Sigma`	covariance matrix of mixtures components.
`W`	estimated loading matrix
`Lam_vec`	estimated variance of errors in probabilistic PCA model
`loglik`	pseudo observed log-likelihood.

Note

nothing

Author(s)

Wei Liu

References

Wei Liu, Xu Liao, Yi Yang, Huazhen Lin, Joe Yeong, Xiang Zhou, Xingjie Shi & Jin Liu (2022). Joint dimension reduction and clustering analysis of single-cell RNA-seq and spatial transcriptomics data, Nucleic Acids Research, gkac219.

Examples

## we generate the spatial transcriptomics data with lattice neighborhood, i.e. ST platform.
seu <- gendata_RNAExp(height=10, width=10,p=50, K=4)
library(Seurat)
seu <- NormalizeData(seu, verbose=FALSE)
# choose 40 highly variable features using FindVariableFeatures in Seurat
# seu <- FindVariableFeatures(seu, nfeatures = 40)
# or choose 40 spatailly variable features using FindSVGs in DR.SC
seu <- FindSVGs(seu, nfeatures = 40, verbose=FALSE)
# users define the adjacency matrix
Adj_sp <- getAdj(seu, platform = 'ST')
if(class(seu@assays$RNA)=="Assay5"){
 var.features <- seu@assays$RNA@meta.data$var.features
 var.features  <- var.features[!is.na(var.features )]
 dat <- GetAssayData(seu, assay = "RNA", slot='data')
 X <- Matrix::t(dat[var.features,])
}else{
 var.features <- seu@assays$RNA@var.features
 X <- Matrix::t(seu[["RNA"]]@data[var.features,])
}


# maxIter = 2 is only used for illustration, and user can use default.
drscList <- DR.SC_fit(X,Adj_sp=Adj_sp, K=4, maxIter=2, verbose=TRUE)

[Package DR.SC version 3.4 Index]