R: Penalized Principal Component Analysis for Marker Gene...

pca_pen {markerpen}

R Documentation

Penalized Principal Component Analysis for Marker Gene Selection

Description

This function solves the optimization problem

\min\quad-\mathrm{tr}(SX) + \lambda p(X),

s.t.\quad O\preceq X \preceq I, \quad X \ge 0, \quad\mathrm{and}\quad \mathrm{tr}(X)=1,

where O\preceq X \preceq I means all eigenvalues of X are between 0 and 1, X \ge 0 means all elements of X are nonnegative, and p(X) is a penalty function defined in the article (see the References section).

Usage

pca_pen(
  S,
  gr,
  lambda,
  w = 1.5,
  alpha = 0.01,
  maxit = 1000,
  eps = 1e-04,
  verbose = 0
)

Arguments

`S`	The sample correlation matrix of gene expression.
`gr`	Indices of genes that are treated as markers in the prior information.
`lambda`	Tuning parameter to control the sparsity of eigenvectors.
`w`	Tuning parameter to control the weight on prior information. Larger `w` means genes not in the prior list are less likely to be selected as markers.
`alpha`	Step size of the optimization algorithm.
`maxit`	Maximum number of iterations.
`eps`	Tolerance parameter for convergence.
`verbose`	Level of verbosity.

Value

A list containing the following components:

projection: The estimated projection matrix.
evecs: The estimated eigenvectors.
niter: Number of iterations used in the optimization process.
err_v: The optimization error in each iteration.

References

Qiu, Y., Wang, J., Lei, J., & Roeder, K. (2020). Identification of cell-type-specific marker genes from co-expression patterns in tissue samples.

Examples

set.seed(123)
n = 200  # Sample size
p = 500  # Number of genes
s = 50   # Number of true signals

# The first s genes are true markers, and others are noise
Sigma = matrix(0, p, p)
Sigma[1:s, 1:s] = 0.9
diag(Sigma) = 1

# Simulate data from the covariance matrix
x = matrix(rnorm(n * p), n) %*% chol(Sigma)

# Sample correlation matrix
S = cor(x)

# Indices of prior marker genes
# Note that we have omitted 10 true markers, and included 10 false markers
gr = c(1:(s - 10), (s + 11):(s + 20))

# Run the algorithm
res = pca_pen(S, gr, lambda = 0.1, verbose = 1)

# See if we can recover the true correlation structure
image(res$projection, asp = 1)

[Package markerpen version 0.1.1 Index]