test_genesets {goat} | R Documentation |
Perform geneset enrichment testing using any supported method
Description
Perform geneset enrichment testing using any supported method
Usage
test_genesets(
genesets,
genelist,
method,
padj_method = "BH",
padj_sources = TRUE,
padj_cutoff = 0.01,
padj_min_signifgenes = 0L,
...
)
Arguments
genesets |
tibble with genesets, must contain columns 'source', 'source_version', 'id', 'name', 'genes', 'ngenes', 'ngenes_signif' |
genelist |
tibble with genes, must contain column 'gene' and 'test'. gene = character column, which are matched against list column 'genes' in genesets tibble. test = boolean column (you can set all to FALSE if not performing Fisher-exact or hypergeometric test downstream) |
method |
method for overrepresentation analysis. Options: "goat", "hypergeometric", "fisherexact", "fisherexact_ease", "gsea", "idea" |
padj_method |
first step of multiple testing correction; method for p-value adjustment, passed to |
padj_sources |
second step of multiple testing correction; apply Bonferroni adjustment to all p-values according to the number of geneset sources that were tested. Boolean parameter, set TRUE to enable (default) or FALSE to disable |
padj_cutoff |
cutoff for adjusted p-value, |
padj_min_signifgenes |
if a value larger than zero is provided, this will perform additional post-hoc filtering; after p-value adjustment, set the |
... |
further parameters are passed to the respective stats method |
Details
After application of the enrichment testing algorithm (e.g. GOAT, ORA or GSEA), multiple testing correction is applied to obtain adjusted p-values using padjust_genesets
.
That function will first apply the specified pvalue adjustment procedure in the padj_method
parameter within each 'source' in the genesets table. Second, it applies Bonferroni adjustment to all p-values according to the number of different geneset sources that were tested (or set padj_sources = FALSE
to disable).
For example, if the input is a genesets table that contains GO_CC, GO_BP and GO_MF genesets, first multiple testing correction is applied within each source (e.g. using FDR if so desired) and afterwards a Bonferroni correction is applied based on 3 repeated analyses.
Note that this is more rigorous than typical GO tools; hypothetically, one could split all GO_CC pathways into 1000 different databases/'sources' and then run enrichment testing. Consequently, the multiple testing burden is reduced if one doesn't adjust p-values for the number of 'sources' as we do here.
Value
the input genesets
, with results stored in columns 'pvalue', 'pvalue_adjust' and 'signif'
Examples
#' # note; this example downloads data when first run, and typically takes ~60seconds
## Basic example for a complete GOAT workflow
# Downloads test data to your computer and stores it at current working directory
# Refer to the GitHub documentation for elaborate documentation and a worked example
# store the downloaded files in the following directory. Here, the temporary file
# directory is used. Alternatively, consider storing this data in a more permanent location.
# e.g. output_dir="~/data/go" on unix systems or output_dir="C:/data/go" on Windows
output_dir = tempdir()
# download an example gene list
datasets = download_goat_manuscript_data(output_dir)
genelist = datasets$`Wingo 2020:mass-spec:PMID32424284`
# download GO genesets
genesets_asis = download_genesets_goatrepo(output_dir)
# filter genesets for sufficient overlap with the genelist, then apply GOAT
genesets_filtered = filter_genesets(genesets_asis, genelist)
result = test_genesets(genesets_filtered, genelist, method = "goat",
score_type = "effectsize", padj_method = "bonferroni", padj_cutoff = 0.05)
# print first 10 rows of the result table
print(result |> select(source, name, ngenes, pvalue_adjust) |> utils::head(n=10))