R: Performs pathway enrichment using over-representation...

olink_pathway_enrichment {OlinkAnalyze}

R Documentation

Performs pathway enrichment using over-representation analysis (ORA) or gene set enrichment analysis (GSEA)

Description

This function performs enrichment analysis based on statistical test results and full data using clusterProfiler's gsea and enrich functions for MSigDB.

Usage

olink_pathway_enrichment(
  data,
  test_results,
  method = "GSEA",
  ontology = "MSigDb",
  organism = "human",
  pvalue_cutoff = 0.05,
  estimate_cutoff = 0
)

Arguments

`data`	NPX data frame in long format with at least protein name (Assay), OlinkID, UniProt,SampleID, QC_Warning, NPX, and LOD
`test_results`	a dataframe of statistical test results including Adjusted_pval and estimate columns.
`method`	Either "GSEA" (default) or "ORA"
`ontology`	Supports "MSigDb" (default), "KEGG", "GO", and "Reactome" as arguments. MSigDb contains C2 and C5 genesets. C2 and C5 encompass KEGG, GO, and Reactome.
`organism`	Either "human" (default) or "mouse"
`pvalue_cutoff`	(numeric) maximum Adjusted p-value cutoff for ORA filtering of foreground set (default = 0.05). This argument is not used for GSEA.
`estimate_cutoff`	(numeric) minimum estimate cutoff for ORA filtering of foreground set (default = 0) This argument is not used for GSEA.

Details

MSigDB is subset if the ontology argument is KEGG, GO, or Reactome. test_results must contain estimates for all assays. Posthoc results can be used but should be filtered for one contrast to improve interpretability. Alternative statistical results can be used as input as long as they include the columns "OlinkID", "Assay", and "estimate". A column named "Adjusted_pal" is also needed for ORA. Any statistical results that contains one estimate per protein will work as long as the estimates are comparable to each other.

clusterProfiler is originally developed by Guangchuang Yu at the School of Basic Medical Sciences at Southern Medical University.

T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation. 2021, 2(3):100141. doi: 10.1016/j.xinn.2021.100141

NB: We strongly recommend to set a seed prior to running this function to ensure reproducibility of the results.

A few notes on Pathway Enrichment with Olink Data

It is important to note that sometimes the proteins that are assayed in Olink Panels are related to specific biological areas and therefore do not represent an unbiased overview of the proteome as a whole. Pathways can only interpreted based on the background/context they came from. For this reason, an estimate for all assays measured must be provided. Furthermore, certain pathways cannot come up based on Olink's coverage in this area. Additionally, if only the Inflammation panel was run, then the available pathways would be given based on a background of proteins related to inflammation. Both ORA and GSEA can provide mechanistic and disease related insight and are best to use when trying to uncover pathways/annotations of interest. It is recommended to only use pathway enrichment for hypothesis generating data, which is more well suited for data on the Explore platform or on multiple Target 96 panels. For smaller lists of proteins it may be more informative to use biological annotation in directed research, to discover which significant assay are related to keywords of interest.

Value

A data frame of enrichment results. Columns for ORA include:

ID: "character" Pathway ID from MSigDB
Description: "character" Description of Pathway from MSigDB
GeneRatio: "character" ratio of input proteins that are annotated in a term
BgRatio: "character" ratio of all genes that are annotated in this term
pvalue: "numeric" p-value of enrichment
p.adjust: "numeric" Adjusted p-value (Benjamini-Hochberg)
qvalue: "numeric" false discovery rate, the estimated probability that the normalized enrichment score represents a false positive finding
geneID: "character" list of input proteins (Gene Symbols) annotated in a term delimited by "/"
Count: "integer" Number of input proteins that are annotated in a term

Columns for GSEA:

ID: "character" Pathway ID from MSigDB
Description: "character" Description of Pathway from MSigDB
setSize: "integer" ratio of input proteins that are annotated in a term
enrichmentScore: "numeric" Enrichment score, degree to which a gene set is over-represented at the top or bottom of the ranked list of genes
NES: "numeric" Normalized Enrichment Score, normalized to account for differences in gene set size and in correlations between gene sets and expression data sets. NES can be used to compare analysis results across gene sets.
pvalue: "numeric" p-value of enrichment
p.adjust: "numeric" Adjusted p-value (Benjamini-Hochberg)
qvalue: "numeric" false discovery rate, the estimated probability that the normalized enrichment score represents a false positive finding
rank: "numeric" the position in the ranked list where the maximum enrichment score occurred
leading_edge: "character" contains tags, list, and signal. Tags gives an indication of the percentage of genes contributing to the enrichment score. List gives an indication of where in the list the enrichment score is obtained. Signal represents the enrichment signal strength and combines the tag and list.
core_enrichment: "character" list of input proteins (Gene Symbols) annotated in a term delimited by "/"

Examples


library(dplyr)
npx_df <- npx_data1 %>% filter(!grepl("control", SampleID, ignore.case = TRUE))
ttest_results <- olink_ttest(
  df = npx_df,
  variable = "Treatment",
  alternative = "two.sided"
)
try({ # This expression might fail if dependencies are not installed
gsea_results <- olink_pathway_enrichment(data = npx_data1, test_results = ttest_results)
ora_results <- olink_pathway_enrichment(
  data = npx_data1,
  test_results = ttest_results, method = "ORA"
)
}, silent = TRUE)

[Package OlinkAnalyze version 3.8.2 Index]