R: Quadratic Forms in Large Matrices

bigQF-package {bigQF}

R Documentation

Quadratic Forms in Large Matrices

Description

A computationally-efficient leading-eigenvalue approximation to tail probabilities and quantiles of large quadratic forms, in particular for the Sequence Kernel Association Test (SKAT) used in genomics <doi:10.1002/gepi.22136>. Also provides stochastic singular value decomposition for dense or sparse matrices.

Details

The DESCRIPTION file:

Package:	bigQF
Type:	Package
Title:	Quadratic Forms in Large Matrices
Version:	1.6
Author:	Thomas Lumley
Maintainer:	Thomas Lumley <t.lumley@auckland.ac.nz>
Description:	A computationally-efficient leading-eigenvalue approximation to tail probabilities and quantiles of large quadratic forms, in particular for the Sequence Kernel Association Test (SKAT) used in genomics <doi:10.1002/gepi.22136>. Also provides stochastic singular value decomposition for dense or sparse matrices.
URL:	https://github.com/tslumley/bigQF
Imports:	svd, CompQuadForm, Matrix, stats, coxme
Suggests:	knitr, rmarkdown, SKAT
VignetteBuilder:	knitr
Depends:	methods
License:	GPL-2

Index of help topics:

SKAT.example            Data example from SKAT package
SKAT.matrixfree         Make 'matrix-free' object for SKAT test
bigQF-package           Quadratic Forms in Large Matrices
famSKAT                 Implicit matrix for family-based SKAT test
pQF                     Tail probabilities for quadratic forms
seigen                  Stochastic singular value decomposition
seqMetaExample          Example data, from seqMeta package
sequence                Simulated human DNA variant sequence
sparse.matrixfree       Make 'matrix-free' object from (sparse) Matrix

This package computes tail probabilities for large quadratic forms, with the motivation being the SKAT test used in DNA sequence association studies.

The true distribution is a linear combination of 1-df chi-squared distributions, where the coefficients are the non-zero eigenvalues of the matrix A defining the quadratic form z^TAz. The package uses an approximation to the distribution consisting of the largest neig terms in the linear combination plus the Satterthwaite approximation to the rest of the linear combination.

The main function is pQF, which has options for how to compute the leading eigenvalues (Lanczos-type algorithm or stochastic SVD) and how to compute the linear combination (inverting the characteristic function or a saddlepoint approximation). The Lanczos algorithm is from the svd package; the stochastic SVD can be called directly via ssvd or seigen

Given a square matrix, pQF uses it as A. If the input is a non-square matrix M, then A is crossprod(M). The function can also be used matrix-free, given an object containing functions to compute the product and transpose-product by M. This last option is described in the "matrix-free" vignette. The matrix-free algorithm also uses a randomised estimator to estimate the trace of crossprod(A). The function sparse.matrixfree constructs a object for matrix-free use of pQF from a sparse Matrix object. The algorithms are described in the Lumley et al (2018) reference.

Finally, there are functions specifically for the SKAT family of genomic tests. These take a genotype matrix and an adjustment model as arguments and produce an object that contains the test statistic in its Q component and which can be used as an argument to pQF to extract p-values: SKAT.matrixfree and famSKAT. The vignette "Checking pQF vs SKAT" compares SKAT.matrixfree to the SKAT package and illustrates how it can be used

Author(s)

Thomas Lumley

Maintainer: Thomas Lumley <t.lumley@auckland.ac.nz>

References

Tong Chen, Thomas Lumley (2019) Numerical evaluation of methods approximating the distribution of a large quadratic form in normal variables. Computational Statistics & Data Analysis. 139: 75-81,

Lumley et al. (2018) Sequence kernel association tests for large sets of markers: tail probabilities for large quadratic forms. Genet Epidemiol . 2018 Sep;42(6):516-527. doi: 10.1002/gepi.22136

Nathan Halko, Per-Gunnar Martinsson, Joel A. Tropp (2010) Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. https://arxiv.org/abs/0909.4061.

Lee, S., with contributions from Larisa Miropolsky, and Wu, M. (2015). SKAT: SNP-Set (Sequence) Kernel Association Test. R package version 1.1.2.

Lee, S., Wu, M. C., Cai, T., Li, Y., Boehnke, M., and Lin, X. (2011). Rare-variant association testing for sequencing data with the sequence kernel association test. American Journal of Human Genetics, 89:82-93.

[Package bigQF version 1.6 Index]