enrichmentAnalysis {shiftR}R Documentation

Fast Enrichment Testing via Circular Permutations on Non-Binary Outcomes

Description

This function performs enrichment analysis on two sets of matching test statistics. The circular permutation scheme accounts for possible local correlation of test statistcs. The testing is performed using the quantile thresholds provided for each data set.

For every permutation the enrichment is measure with Cramer's V coefficient. The maximum/minimum coefficient across all considered thresholds is recorded. It is then compared with the maximum/minimum coefficient observed without permuting the data.

For matching data sets calculated at different genomic locations please use matchDatasets.

Usage

enrichmentAnalysis(
    pvstats1,
    pvstats2,
    percentiles1 = NULL,
    percentiles2 = NULL,
    npermute,
    margin = 0.05,
    threads = 1)

Arguments

pvstats1

The vector of statistics for primary data set.
The statistics must be p-value like, i.e. smaller is better.

pvstats2

The vector of statistics for auxiliary data set.
The statistics must be p-value like, i.e. smaller is better.

percentiles1

These quantile thresholds are used to cut off top results in the primary data set for matching with the top results in the auxiliary.
Can be omitted if the vector pvstats1 is binary.

percentiles2

Same as percentiles1, but for the other data set.

npermute

Number of permutations to perform.

margin

The minimum offset in the circular permutation to consider.
Can be a fraction of total number of values or an integer count of values.
Passed in the call of getOffsetsRandom for generation of offsets.

threads

The number of CPU cores to use for calculations.
Set to TRUE to use all cores.
Multithreading is turned off by default.

Value

Returns a list with:

overallPV

The p-values for the overall test across all thresholds.
The p-values are for enrichment, depletion, and two-sided test for both.

byThresholdPV

The p-values for tests for each individual threshold.
The p-values provided for enrichment, depletion, and two-sided test.

Author(s)

Andrey A Shabalin andrey.shabalin@gmail.com

Examples

### Data size
n = 1e5

### Generate vectors of test statistics with local correlation
window = 1000
pvstats1 = diff(cumsum(runif(n+window)), lag = window)
pvstats2 = diff(cumsum(runif(n+window)), lag = window)

# Add a bit of dependence
pvstats1 = pvstats1 + 0.5 * pvstats2

# test top 0.1, 1, 3, 5, and 10 percent

percentiles1 = c(0.001, 0.01, 0.03, 0.05, 0.1)
percentiles2 = c(0.001, 0.01, 0.03, 0.05, 0.1)

# The offset margin

margin = 0.05

# Set the number of permutations
# to the maximum

npermute = 1e3


enr = enrichmentAnalysis(
        pvstats1,
        pvstats2,
        percentiles1,
        percentiles2,
        npermute,
        margin ,
        threads = 2)

# View the results
enr

[Package shiftR version 1.5 Index]