hack_sig {hacksig} | R Documentation |
Score samples by gene signatures
Description
Compute gene signature single sample scores in one of different ways. You can choose to apply either the original procedure or one of three single sample scoring methods: the combined z-score (Lee et al., 2008), the single sample GSEA (Barbie et al., 2009) or the singscore method (Foroutan et al., 2018).
Usage
hack_sig(
expr_data,
signatures = "all",
method = "original",
direction = "none",
sample_norm = "raw",
rank_norm = "none",
alpha = 0.25
)
Arguments
expr_data |
A normalized gene expression matrix (or data frame) with gene symbols as row names and samples as columns. |
signatures |
It can be a list of signatures or a character vector indicating
keywords for a group of signatures. The default ( |
method |
A character string specifying which method to use for computing the single sample score for each signature. You can choose one of:
|
direction |
A character string specifying the singscore computation method depending on the direction of the signatures. Can be on of:
|
sample_norm |
A character string specifying the type of normalization affecting the single sample GSEA scores. Can be one of:
|
rank_norm |
A character string specifying how gene expression ranks should be normalized in the single sample GSEA procedure. Valid choices are:
|
alpha |
A numeric scalar. Exponent in the running sum of the single sample GSEA
score calculation which weighs the gene ranks. Defaults to |
Details
For "original"
method, it is intended the procedure used in the original
publication by the authors for computing the signature score.
hack_sig()
can compute signature scores with the original method only if
this is a relatively simple procedure (e.g weighted sum of fitted model
coefficients and expression values).
For more complex methods, such as CINSARC, ESTIMATE and Immunophenoscore,
use the dedicated functions.
If signatures
is a custom list of gene signatures, then the "ssgsea"
method will be applied by default.
Value
A tibble with one row for each sample in expr_data
, a column sample_id
indicating sample identifiers and one column for each input signature giving
single sample scores.
Algorithm
This section gives a brief explanation of how single sample scores are obtained from different methods.
Combined z-score
Gene expression values are centered by their mean value and scaled by their standard deviation across samples for each gene (z-scores). Then, for each sample and signature, corresponding z-scores are added up and divided by the square root of the signature size (i.e. the number of genes composing a signature).
The combined z-score method is also implemented in the R package GSVA
(Hänzelmann et al., 2013).
Single sample GSEA
For each sample, genes are ranked by expression value in increasing order and
rank normalization may follow (see argument rank_norm
). Then, two probability-like
vectors are computed for each sample and signature:
-
P_{in}
, the cumulative sum of weighted ranks divided by their total sum for genes in the signature; -
P_{out}
, the cumulative sum of ones (indicating genes not in the signature) divided by the number of genes not in the signature.
The single sample GSEA score is obtained by adding up the elements of the
vector difference P_{in} - P_{out}
.
Finally, single sample scores could be normalized either across samples or across
gene signatures and samples.
The single sample GSEA method is also implemented in the R package GSVA
(Hänzelmann et al., 2013).
Singscore
For signatures whose genes are supposed to be up- or down-regulated, genes are ranked by expression value in increasing or decreasing order, respectively. For signatures whose direction is unknown, genes are ranked by absolute expression in increasing order and are median-centered. Enrichment scores are then computed for each sample and signature by averaging gene ranks for genes in the signature. Finally, normalized scores are obtained by subtracting the theoretical minimum mean rank from the score and dividing by the difference between the theoretical maximum and minimum mean ranks.
The hacksig
implementation of this method works only with unidirectional (i.e.
all genes up- or down-regulated) and undirected gene signatures.
If you want to get single sample scores for bidirectional gene signatures (i.e.
signatures composed of both up- and down-regulated genes), please use the R
package singscore
(Foroutan et al., 2018).
References
Barbie, D. A., Tamayo, P., Boehm, J. S., Kim, S. Y., Moody, S. E., Dunn, I. F., Schinzel, A. C., Sandy, P., Meylan, E., Scholl, C., Fröhling, S., Chan, E. M., Sos, M. L., Michel, K., Mermel, C., Silver, S. J., Weir, B. A., Reiling, J. H., Sheng, Q., Gupta, P. B., … Hahn, W. C. (2009). Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature, 462(7269), 108–112. doi: 10.1038/nature08460.
Foroutan, M., Bhuva, D. D., Lyu, R., Horan, K., Cursons, J., & Davis, M. J. (2018). Single sample scoring of molecular phenotypes. BMC bioinformatics, 19(1), 404. doi: 10.1186/s12859-018-2435-4.
Hänzelmann, S., Castelo, R., & Guinney, J. (2013). GSVA: gene set variation analysis for microarray and RNA-seq data. BMC bioinformatics, 14, 7. doi: 10.1186/1471-2105-14-7.
Lee, E., Chuang, H. Y., Kim, J. W., Ideker, T., & Lee, D. (2008). Inferring pathway activity toward precise disease classification. PLoS computational biology, 4(11), e1000217. doi: 10.1371/journal.pcbi.1000217.
See Also
get_sig_info()
to get information about all implemented signatures.
check_sig()
to check if signatures are applicable to your data.
hack_cinsarc()
to apply the original CINSARC procedure.
hack_estimate()
to obtain the original ESTIMATE scores.
hack_immunophenoscore()
to apply the original Immunophenoscore procedure.
Examples
# Raw ssGSEA scores for all implemented signatures can be obtained with:
hack_sig(test_expr, method = "ssgsea")
# To obtain 0-1 normalized ssGSEA scores, use:
hack_sig(test_expr, method = "ssgsea", sample_norm = "separate")
# You can also change the exponent of the ssGSEA running sum with:
hack_sig(test_expr, method = "ssgsea", sample_norm = "separate", alpha = 0.5)
# To obtain combined z-scores for custom gene signatures, use:
custom_list <- list(rand_sig1 = rownames(test_expr)[1:5],
rand_sig2 = c(rownames(test_expr)[6:8], "RANDOMGENE"))
hack_sig(test_expr, custom_list, method = "zscore")