refine_markers {markerpen} | R Documentation |
Marker Gene Selection via Penalized Principal Component Analysis
Description
This function refines a prior marker gene list by combining information from bulk tissue data, based on the penalized principal component analysis. The current implementation computes on one cell type at a time. To get marker genes for multiple cell types, call this function iteratively.
Usage
refine_markers(
mat_exp,
range,
markers,
lambda,
w = 1.5,
thresh = 0.001,
alpha = 0.01,
maxit = 1000,
eps = 1e-04,
verbose = 0
)
Arguments
mat_exp |
The gene expression matrix in the original scale (not logarithm-transformed), with rows standing for observations and columns for genes. The matrix should include gene names as column names. |
range |
A character vector of gene names, representing the range of genes in which markers are sought. |
markers |
A character vector of gene names giving the prior marker gene list. |
lambda |
A tuning parameter to control the number of selected marker genes. A larger value typically means a smaller number of genes. |
w |
Tuning parameter to control the weight on prior information.
Larger |
thresh |
Below this threshold small factor loadings are treated as zeros. |
alpha |
Step size of the optimization algorithm. |
maxit |
Maximum number of iterations. |
eps |
Tolerance parameter for convergence. |
verbose |
Level of verbosity. |
Value
A list containing the following components:
- spca
The sparse PCA result as in
pca_pen()
.- markers
A character vector of selected markers genes.
- markers_coef
The estimated factor loadings for the associated genes.
References
Qiu, Y., Wang, J., Lei, J., & Roeder, K. (2020). Identification of cell-type-specific marker genes from co-expression patterns in tissue samples.
Examples
# Data used in the vignette
load(system.file("examples", "gene_expr.RData", package = "markerpen"))
load(system.file("examples", "published_markers.RData", package = "markerpen"))
load(system.file("examples", "markers_range.RData", package = "markerpen"))
# Get expression matrix - rows are observations, columns are genes
ind = match(rownames(dat), markerpen::gene_mapping$name)
ind = na.omit(ind)
ensembl = markerpen::gene_mapping$ensembl[ind]
mat_exp = t(dat[markerpen::gene_mapping$name[ind], ])
colnames(mat_exp) = ensembl
# We compute the marker genes for two cell types with a reduced problem size
# See the vignette for the full example
# Markers for astrocytes
set.seed(123)
search_range = intersect(markers_range$astrocytes, ensembl)
search_range = sample(search_range, 300)
prior_markers = intersect(pub_markers$astrocytes, search_range)
ast_re = refine_markers(
mat_exp, search_range, prior_markers,
lambda = 0.35, w = 1.5, maxit = 500, eps = 1e-3, verbose = 0
)
# Remove selected markers from the expression matrix
mat_rest = mat_exp[, setdiff(colnames(mat_exp), ast_re$markers)]
# Markers for microglia
search_range = intersect(markers_range$microglia, ensembl)
search_range = sample(search_range, 300)
prior_markers = intersect(pub_markers$microglia, search_range)
mic_re = refine_markers(
mat_exp, search_range, prior_markers,
lambda = 0.35, w = 1.5, maxit = 500, eps = 1e-3, verbose = 0
)
# Refined markers
markers_re = list(astrocytes = ast_re$markers,
microglia = mic_re$markers)
# Visualize the correlation matrix
cor_markers = cor(mat_exp[, unlist(markers_re)])
image(cor_markers, asp = 1)
# Post-process the selected markers
# Pick the first 20 ordered markers
markers_ord = sort_markers(cor_markers, markers_re)
markers_ord = lapply(markers_ord, head, n = 20)
# Visualize the correlation matrix
image(cor(mat_exp[, unlist(markers_ord)]), asp = 1)