sumSTAAR {sumFREGAT} | R Documentation |
variant-Set Test for Association using Annotation infoRmation on summary statistics
Description
STAAR procedure developed by Li et al. (2020) adapted to use on summary statistics, with extentions
Usage
sumSTAAR(score.file, gene.file, genes = 'all', cor.path = 'cor/',
tests = c('BT', 'SKAT', 'ACAT'), beta.par.matrix = rbind(c(1, 1), c(1, 25)),
prob.causal = 'all', phred = TRUE, n = NA, mac.threshold = NA,
approximation = TRUE, write.file = FALSE, staar.output = TRUE, quiet = FALSE)
Arguments
score.file |
name of data file generated by |
gene.file |
can be a name of a custom text file listing genes in refFlat format. Values "hg19" and "hg38"
can be set to use default gene files containing protein-coding genes with start and stop positions from corresponding build.
Mind that the same build should be used in |
genes |
character vector of gene names to be analyzed. Can be "chr1", "chr2" etc. to analyze
all genes on a corresponding chromosome. If not set, function will
attempt to analyze all genes listed in |
cor.path |
path to a folder with correlation matrix files (one file per each gene to be analyzed).
Correlation matrices in text format are allowed, though ".RData" is preferable as computationally efficient.
Each file should contain a square matrix with correlation coefficients (r) between genetic variants
of a gene. An example of correlation file format: |
tests |
a character vector with gene-based methods to be applied. By default, three methods are used: 'BT', 'SKAT',
and 'ACAT'. Other weighted tests ('SKATO', 'PCA', 'FLM') can be included. Tests that do not imply
weighting ('simpleM', 'minP', 'sumchi'), if listed, will be calculated once and combined along with other tests
into the total sumSTAAR p-value. ACAT performs with |
beta.par.matrix |
an (n x 2) matrix with rows corresponding to n |
prob.causal |
a character vector to define a set of annotations to be used. Annotations should be initially passed to
|
phred |
a logical value indicating whether probabilities are in PHRED scale. If |
n |
size of the sample on which summary statistics were obtained. Should be assigned if 'PCA' or 'FLM'
are included in |
mac.threshold |
an integer number. In ACAT, scores with MAC <= 10 will be combined using Burden test. MACs are calculated
from MAFs, |
approximation |
logical value indicating whether approximation should be used for SKAT, SKATO, PCA and FLM. |
write.file |
output file name. If specified, output for all tests (as it proceeds) will be written to corresponding files. |
staar.output |
logical value indicating whether extensive output format should be used (see Details). |
quiet |
|
Details
The STAAR procedure has been recently proposed by Li et al. (2020) and described in detail therein. It calculates
a set of P values using a range of gene-based tests, beta distribution weights parameters, and weighting by each of
10 functional annotations. The P values are then combined using Cauchy method (see ACATO()
function and Liu, Y. et al., (2019)).
Value
With staar.output = FALSE
returns a data table with a single P value for each test (combinations of differently weighted and unweighted
iterations of the same test) and a total sumSTAAR P value. ACAT-O method is used to combine P values (Liu, Y. et al., 2019).
If staar.output = TRUE
the function returns a data frame of size (n.genes x (n.tests x n.beta.pars x (n.annotations + 1) + n.tests + 2)) containing P values
for combined tests and all individual tests. The output is analogous to that of original STAAR procedure. For example,
by default, the output will contain columns:
gene # gene symbol
BT.1.1.PROB0 # P value for burden test with beta.par = c(1, 1) and no annotations
BT.1.1.PROB1 # P value for burden test with beta.par = c(1, 1) and weighted by first annotation
BT.1.1.PROB2 # P value for burden test with beta.par = c(1, 1) and weighted by second annotation
...
BT.1.1.PROB10 # P value for burden test with beta.par = c(1, 1) and weighted by tenth annotation
BT.1.1.STAAR # combined P value of all burden tests with beta.par = c(1, 1)
BT.1.25.PROB0 # P value for burden test with beta.par = c(1, 25) and no annotations
BT.1.25.PROB1 # P value for burden test with beta.par = c(1, 25) and weighted by first annotation
...
BT.1.25.PROB10 # P value for burden test with beta.par = c(1, 25) and weighted by tenth annotation
BT.1.25.STAAR # combined P value of all burden tests with beta.par = c(1, 25)
SKAT.1.1.PROB0 # P value for SKAT with beta.par = c(1, 1) and no annotations
SKAT.1.1.PROB1 # P value for SKAT with beta.par = c(1, 1) and weighted by first annotation
...
SKAT.1.1.PROB10 # P value for SKAT with beta.par = c(1, 1) and weighted by tenth annotation
SKAT.1.1.STAAR # combined P value of all SKATs with beta.par = c(1, 1)
SKAT.1.25.PROB0 # P value for SKAT with beta.par = c(1, 25) and no annotations
...
SKAT.1.25.PROB10 # P value for SKAT with beta.par = c(1, 25) and weighted by tenth annotation
SKAT.1.25.STAAR # combined P value of all SKATs with beta.par = c(1, 25)
ACAT.1.1.PROB0 # P value for ACAT with beta.par = c(1, 1) and no annotations
ACAT.1.1.PROB1 # P value for ACAT with beta.par = c(1, 1) and weighted by first annotation
...
ACAT.1.1.PROB10 # P value for ACAT with beta.par = c(1, 1) and weighted by tenth annotation
ACAT.1.1.STAAR # combined P value of all ACATs with beta.par = c(1, 1)
ACAT.1.25.PROB0 # P value for ACAT with beta.par = c(1, 25) and no annotations
...
ACAT.1.25.PROB10 # P value for ACAT with beta.par = c(1, 25) and weighted by tenth annotation
ACAT.1.25.STAAR # combined P value of all ACATs with beta.par = c(1, 25)
sumSTAAR.ACAT_O # combined P value of all 'PROB0' gene-based tests (without weighting by annotations)
sumSTAAR.STAAR_O # combined P value of all gene-based tests
References
Belonogova et al. (2022) SumSTAAR: A flexible framework for gene-based association studies using GWAS summary statistics. PLOS Comp Biol. https://doi.org/10.1371/journal.pcbi.1010172
Li X., et al. (2020) Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nature Genetics. DOI: 10.1038/s41588-020-0676-4.
Liu Y. et al. (2019) ACAT: a fast and powerful p value combination method for rare-variant analysis in sequencing studies. Am. J. Hum. Genet. 104, 410-421.
Examples
cor.path <- system.file("testfiles/", package = "sumFREGAT")
score.file <- system.file("testfiles/CFH.prob.vcf.gz",
package = "sumFREGAT")
sumSTAAR(score.file, prob.causal = "PROB", gene.file = "hg19",
genes = "CFH", cor.path, quiet = TRUE)
## Not run:
score.file <- system.file("testfiles/CFH.prob.phred.vcf.gz",
package = "sumFREGAT")
res <- sumSTAAR(score.file, gene.file = "hg19", genes = "CFH",
cor.path, quiet = TRUE)
## End(Not run)