score_geneset_directionality {goat} | R Documentation |
Compute a score between -1 and 1 representing the proportion of up- or down-regulated genes for each geneset, weighted by gene effectsizes
Description
Importantly, the scope/utility of this score is limited to help users post-hoc filter for genesets that contain mostly up/down-regulated genes.
However, this might not coincide with the geneset pvalues / significance. For example,
genesets may exclusively contain genes with a positive effectsize but at the same time these can all be minor effects/values and thus the geneset
is not significant or less significant than other genesets with the exact same "directionality score".
For example, genesets may contain both up- and down-regulated genes but still be significant when testing with GOAT and using score_type='effectsize'
The scores computed with this function can help in post-hoc interpretation of GOAT results to further narrow down all significant genesets to a
subset with strong directionality. For example, after test_genesets()
we can filter the results for
A) significant genesets and B) that contain at most N genes and C) that are near-exclusively up/down-regulated.
Bringing this all together (also useful for other types of geneset testing, like ORA, score_type="pvalue", etc);
result = test_genesets(genesets_filtered, genelist, method = "goat", score_type = "effectsize", padj_method = "bonferroni", padj_cutoff = 0.05)
result = score_geneset_directionality(result, genelist)
result |> filter(signif & ngenes <= 100 & abs(score_directionality_rank) >= 0.95)
Usage
score_geneset_directionality(genesets, genelist)
Arguments
genesets |
tibble with genesets, must contain columns 'source', 'id', 'genes' |
genelist |
tibble with genes, must contain columns 'gene', 'effectsize' |
Value
input genesets
with results in 3 columns;
score_directionality_rank
is the weighted gene score where gene values are the sign of their effectsize and weights are linearly proportional to their inverse ranks in the input genelist.
score_directionality_rank2
is similar, but now using rank^2 gene weights to boost the influence of most-important genes in the input genelist.
score_directionality_value
uses the absolute gene effectsizes as gene weights
Note that the latter is least robust as it depends on the scaling of input data!
score definitions;
geneset directionality score = weighted mean of respective genes, where gene weights are 1 minus their relative rank in up/down-regulation (depending on negative/positive effectsize) and the value for each gene is -1 or 1 depending on up/down-regulation (sign of effectsize).
pseudocode;
gene values and weights A) gene weight between 0 and 1 for the subset of upregulated genes / positive effectsizes;
r = for the subset of genes with effectsize > 0, compute gene rank (1 = highest effectsize, N = smallest effectsize that is greater than zero)
weight = 1 - r/max(r) B) analogous to (A) for the subset of genes with negative effectsize C) result per gene: value = sign of its effectsize, weight = 0 if effectsize 0, otherwise respective weights from (A) or (B)
geneset score_directionality = weighted mean over values/weights of respective genes