partition_genes {goat}R Documentation

Classify genes into 2 groups, e.g. to define significant or topN genes, resulting in a 'signif' column with boolean values

Description

This can be convenient to prepare the significant/test/foreground set for classical ORA, e.g. test_genesets() with parameter method = "fisherexact". Note that the GOAT geneset enrichment algorithm does not use data in the 'signif' column of the input genelist.

Usage

partition_genes(
  genes,
  col,
  decreasing = FALSE,
  use_abs = FALSE,
  cutoff = NULL,
  fraction = NULL,
  topn = NULL
)

Arguments

genes

gene tibble where each row is a unique gene, must contain column name col

col

column name in genes

decreasing

order col in descending (set TRUE) or ascending order (set FALSE, default) prior to partitioning?

use_abs

use absolute values (default FALSE), e.g. when setting a threshold on effect-sizes

cutoff

threshold for values in col to select (must provide exactly 1 parameter for filtering, either cutoff, fraction or topn)

fraction

fraction of rows in genes tibble to select (must provide exactly 1 parameter for filtering, either cutoff, fraction or topn)

topn

number of rows in genes tibble to select (must provide exactly 1 parameter for filtering, either cutoff, fraction or topn)

Value

input table genes with results in the "signif" column

Examples


# note: this example will download 1 files of approx 4MB

# store the downloaded files in the following directory. Here, the temporary file
# directory is used. Alternatively, consider storing this data in a more permanent location.
# e.g. output_dir="~/data/goat" on unix systems or output_dir="C:/data/goat" on Windows
output_dir = tempdir()

# Download an example gene list, i.e. one of the datasets analyzed in the GOAT manuscript.
datasets = download_goat_manuscript_data(output_dir)
genelist = datasets$`Wingo 2020:mass-spec:PMID32424284`

# example 1: significant hits
genelist = partition_genes(genelist, col="pvalue_adjust", decreasing=FALSE, cutoff=0.01)
cat(sum(genelist$signif), "/", nrow(genelist), "are signif\n")

# example 2: abs(effectsize) >= 5
genelist = partition_genes(genelist, col="effectsize", decreasing=TRUE, use_abs=TRUE, cutoff=5)
cat(sum(genelist$signif), "/", nrow(genelist), "are signif\n")

# example 3: top 10% 'best' p-values
genelist = partition_genes(genelist, col="pvalue", decreasing=FALSE, fraction = 0.1)
cat(sum(genelist$signif), "/", nrow(genelist), "are signif\n")


[Package goat version 1.0 Index]