R: ActivePathways

ActivePathways {ActivePathways}

R Documentation

ActivePathways

Description

ActivePathways

Usage

ActivePathways(
  scores,
  gmt,
  background = makeBackground(gmt),
  geneset_filter = c(5, 1000),
  cutoff = 0.1,
  significant = 0.05,
  merge_method = c("Fisher", "Fisher_directional", "Brown", "DPM", "Stouffer",
    "Stouffer_directional", "Strube", "Strube_directional"),
  correction_method = c("holm", "fdr", "hochberg", "hommel", "bonferroni", "BH", "BY",
    "none"),
  cytoscape_file_tag = NA,
  color_palette = NULL,
  custom_colors = NULL,
  color_integrated_only = "#FFFFF0",
  scores_direction = NULL,
  constraints_vector = NULL
)

Arguments

`scores`	A numerical matrix of p-values where each row is a gene and each column represents an omics dataset (evidence). Rownames correspond to the genes and colnames to the datasets. All values must be 0<=p<=1. We recommend converting missing values to ones.
`gmt`	A GMT object to be used for enrichment analysis. If a filename, a GMT object will be read from the file.
`background`	A character vector of gene names to be used as a statistical background. By default, the background is all genes that appear in `gmt`.
`geneset_filter`	A numeric vector of length two giving the lower and upper limits for the size of the annotated geneset to pathways in gmt. Pathways with a geneset shorter than `geneset_filter[1]` or longer than `geneset_filter[2]` will be removed. Set either value to NA to not enforce a minimum or maximum value, or set `geneset_filter` to `NULL` to skip filtering.
`cutoff`	A maximum merged p-value for a gene to be used for analysis. Any genes with merged, unadjusted `p > significant` will be discarded before testing.
`significant`	Significance cutoff for selecting enriched pathways. Pathways with `adjusted_p_val <= significant` will be selected as results.
`merge_method`	Statistical method to merge p-values. See section on Merging P-Values
`correction_method`	Statistical method to correct p-values. See `p.adjust` for details.
`cytoscape_file_tag`	The directory and/or file prefix to which the output files for generating enrichment maps should be written. If NA, files will not be written.
`color_palette`	Color palette from RColorBrewer::brewer.pal to color each column in the scores matrix. If NULL grDevices::rainbow is used by default.
`custom_colors`	A character vector of custom colors for each column in the scores matrix.
`color_integrated_only`	A character vector of length 1 specifying the color of the "combined" pathway contribution.
`scores_direction`	A numerical matrix of log2 transformed fold-change values where each row is a gene and each column represents a dataset (evidence). Rownames correspond to the genes and colnames to the datasets. We recommend converting missing values to zero. Must contain the same dimensions as the scores parameter. Datasets without directional information should be set to 0.
`constraints_vector`	A numerical vector of +1 or -1 values corresponding to the user-defined directional relationship between columns in scores_direction. Datasets without directional information should be set to 0.

Value

A data.table of terms (enriched pathways) containing the following columns:

term_id: The database ID of the term
term_name: The full name of the term
adjusted_p_val: The associated p-value, adjusted for multiple testing
term_size: The number of genes annotated to the term
overlap: A character vector of the genes enriched in the term
evidence: Columns of scores (i.e., omics datasets) that contributed individually to the enrichment of the term. Each input column is evaluated separately for enrichments and added to the evidence if the term is found.

Merging P-values

To obtain a single p-value for each gene across the multiple omics datasets considered, the p-values in scores #' are merged row-wise using a data fusion approach of p-value merging. The eight available methods are:

Fisher: Fisher's method assumes p-values are uniformly distributed and performs a chi-squared test on the statistic sum(-2 log(p)). This method is most appropriate when the columns in scores are independent.
Fisher_directional: Fisher's method modification that allows for directional information to be incorporated with the scores_direction and constraints_vector parameters.
Brown: Brown's method extends Fisher's method by accounting for the covariance in the columns of scores. It is more appropriate when the tests of significance used to create the columns in scores are not necessarily independent. The Brown's method is therefore recommended for many omics integration approaches.
DPM: DPM extends Brown's method by incorporating directional information using the scores_direction and constraints_vector parameters.
Stouffer: Stouffer's method assumes p-values are uniformly distributed and transforms p-values into a Z-score using the cumulative distribution function of a standard normal distribution. This method is appropriate when the columns in scores are independent.
Stouffer_directional: Stouffer's method modification that allows for directional information to be incorporated with the scores_direction and constraints_vector parameters.
Strube: Strube's method extends Stouffer's method by accounting for the covariance in the columns of scores.
Strube_directional: Strube's method modification that allows for directional information to be incorporated with the scores_direction and constraints_vector parameters.

Cytoscape

To visualize and interpret enriched pathways, ActivePathways provides an option to further analyse results as enrichment maps in the Cytoscape software. If !is.na(cytoscape_file_tag), four files will be written that can be used to build enrichment maps. This requires the EnrichmentMap and enhancedGraphics apps.

The four files written are:

pathways.txt: A list of significant terms and the associated p-value. Only terms with adjusted_p_val <= significant are written to this file.
subgroups.txt: A matrix indicating whether the significant terms (pathways) were also found to be significant when considering only one column from scores. A one indicates that term was found to be significant when only p-values in that column were used to select genes.
pathways.gmt: A Shortened version of the supplied GMT file, containing only the significantly enriched terms in pathways.txt.
legend.pdf: A legend with colours matching contributions from columns in scores.

How to use: Create an enrichment map in Cytoscape with the file of terms (pathways.txt) and the shortened gmt file (pathways.gmt). Upload the subgroups file (subgroups.txt) as a table using the menu File > Import > Table from File. To paint nodes according to the type of supporting evidence, use the 'style' panel, set image/Chart1 to use the column 'instruct' and the passthrough mapping type. Make sure the app enhancedGraphics is installed. Lastly, use the file legend.pdf as a reference for colors in the enrichment map.

Examples

    fname_scores <- system.file("extdata", "Adenocarcinoma_scores_subset.tsv", 
         package = "ActivePathways")
    fname_GMT = system.file("extdata", "hsapiens_REAC_subset.gmt",
         package = "ActivePathways")

    dat <- as.matrix(read.table(fname_scores, header = TRUE, row.names = 'Gene'))
    dat[is.na(dat)] <- 1

    ActivePathways(dat, fname_GMT)

[Package ActivePathways version 2.0.5 Index]