R: Marker Gene Selection via Penalized Principal Component...

refine_markers {markerpen}

R Documentation

Marker Gene Selection via Penalized Principal Component Analysis

Description

This function refines a prior marker gene list by combining information from bulk tissue data, based on the penalized principal component analysis. The current implementation computes on one cell type at a time. To get marker genes for multiple cell types, call this function iteratively.

Usage

refine_markers(
  mat_exp,
  range,
  markers,
  lambda,
  w = 1.5,
  thresh = 0.001,
  alpha = 0.01,
  maxit = 1000,
  eps = 1e-04,
  verbose = 0
)

Arguments

`mat_exp`	The gene expression matrix in the original scale (not logarithm-transformed), with rows standing for observations and columns for genes. The matrix should include gene names as column names.
`range`	A character vector of gene names, representing the range of genes in which markers are sought.
`markers`	A character vector of gene names giving the prior marker gene list.
`lambda`	A tuning parameter to control the number of selected marker genes. A larger value typically means a smaller number of genes.
`w`	Tuning parameter to control the weight on prior information. Larger `w` means genes not in the prior list are less likely to be selected as markers.
`thresh`	Below this threshold small factor loadings are treated as zeros.
`alpha`	Step size of the optimization algorithm.
`maxit`	Maximum number of iterations.
`eps`	Tolerance parameter for convergence.
`verbose`	Level of verbosity.

Value

A list containing the following components:

spca: The sparse PCA result as in pca_pen().
markers: A character vector of selected markers genes.
markers_coef: The estimated factor loadings for the associated genes.

References

Qiu, Y., Wang, J., Lei, J., & Roeder, K. (2020). Identification of cell-type-specific marker genes from co-expression patterns in tissue samples.

Examples

# Data used in the vignette
load(system.file("examples", "gene_expr.RData", package = "markerpen"))
load(system.file("examples", "published_markers.RData", package = "markerpen"))
load(system.file("examples", "markers_range.RData", package = "markerpen"))

# Get expression matrix - rows are observations, columns are genes
ind = match(rownames(dat), markerpen::gene_mapping$name)
ind = na.omit(ind)
ensembl = markerpen::gene_mapping$ensembl[ind]
mat_exp = t(dat[markerpen::gene_mapping$name[ind], ])
colnames(mat_exp) = ensembl

# We compute the marker genes for two cell types with a reduced problem size
# See the vignette for the full example

# Markers for astrocytes
set.seed(123)
search_range = intersect(markers_range$astrocytes, ensembl)
search_range = sample(search_range, 300)
prior_markers = intersect(pub_markers$astrocytes, search_range)
ast_re = refine_markers(
    mat_exp, search_range, prior_markers,
    lambda = 0.35, w = 1.5, maxit = 500, eps = 1e-3, verbose = 0
)
# Remove selected markers from the expression matrix
mat_rest = mat_exp[, setdiff(colnames(mat_exp), ast_re$markers)]

# Markers for microglia
search_range = intersect(markers_range$microglia, ensembl)
search_range = sample(search_range, 300)
prior_markers = intersect(pub_markers$microglia, search_range)
mic_re = refine_markers(
    mat_exp, search_range, prior_markers,
    lambda = 0.35, w = 1.5, maxit = 500, eps = 1e-3, verbose = 0
)

# Refined markers
markers_re = list(astrocytes = ast_re$markers,
                  microglia  = mic_re$markers)
# Visualize the correlation matrix
cor_markers = cor(mat_exp[, unlist(markers_re)])
image(cor_markers, asp = 1)

# Post-process the selected markers
# Pick the first 20 ordered markers
markers_ord = sort_markers(cor_markers, markers_re)
markers_ord = lapply(markers_ord, head, n = 20)
# Visualize the correlation matrix
image(cor(mat_exp[, unlist(markers_ord)]), asp = 1)

[Package markerpen version 0.1.1 Index]