R: Tail probabilities for quadratic forms

pQF {bigQF}

R Documentation

Tail probabilities for quadratic forms

Description

Computes the upper tail probability for quadratic forms in standard Normal variables, using a leading-eigenvalue approximation that is feasible even for large matrices. Can take advantage of sparse matrices.

Usage

pQF(x, M, method = c("ssvd", "lanczos", "satterthwaite"), neig = 100,
  tr2.sample.size = 500, q = NULL,
  convolution.method = c("saddlepoint", "integration"),
  remainder.underflow=c("warn","missing","error"))

Arguments

`x`	Vector of quantiles to compute tail probabilities
`M`	If `M` is square, it is the matrix in the quadratic form and is assumed to be symmetric. If it is not square, `crossprod(M)` is the matrix in the quadratic form, although this matrix is never formed explicitly. It can also be an object of class `"matrixfree"` to allow matrix-free multiplication for, eg, sparse matrices; see `SKAT.matrixfree` for an example.
`method`	Use stochastic SVD ("ssvd") or a thick-restarted Lanczos algorithm ("lanczos") to extract the leading eigenvalues, or use the naive Satterthwaite approximation ("satterthwaite")
`neig`	Number of leading eigenvalues to use
`tr2.sample.size`	When `M` is not square, a randomised estimator for the trace of `crossprod(M)^2` is used. This is the sample size. When `M` is of `"matrixfree"` and does not include the trace of crossprod(M) as a component, this sample size will also be used to compute that. See Hutchinson reference.
`q`	Power iteration parameter for the stochastic SVD (see the Halko et al reference)
`convolution.method`	For either the "ssvd" or "lanczos" methods, how the convolution of the leading-eigenvalue terms is performed: by Kuonen's saddlepoint approximation or Davies' algorithm inverting the characteristic function
`remainder.underflow`	How should underflow of the remainder term (see Details) be handled? The default is a warning, with `"error"` an error is thrown, and with `"miss"` the return value will be `NaN`.

Details

Increasing neig or q will improve the accuracy of approximation. In simulated human sequence data, neig=100 is satisfactory for 5000 samples and markers. Increasing q should help when the singular values of M decrease slowly. The default sample size for the randomized trace estimator seems to be large enough.

If the remainder term in the approximation has less than 1 degree of freedom it will be dropped, and remainder.underflow controls how this is handled. The approximation will then be anticonservative, but usually not seriously so.

In earlier versions, method="satterthwaite" rounded the number of degrees of freedom up to the next integer; it now does not round.

In the current version when M is of class "matrix-free" and method="ssvd", an improved estimator of the remainder term is used. Instead of using Hutchinson's randomised trace estimator and subtracting the known eigenvalues, we apply the randomised trace estimator after projecting orthogonal to the known eigenvectors. After more evaluation, the improvement is likely to be extended to the other methods for rectangular matrices.

By default, Davies's algorithm is run with a tolerance of 1e-9. This can be changed by setting, eg, options(bigQF.davies.threshold=1e-12)

Value

Vector of upper tail probabilities

Author(s)

Thomas Lumley

References

Lumley et al. (2018) "Sequence kernel association tests for large sets of markers: tail probabilities for large quadratic forms" Genet Epidemiol. 2018 Sep;42(6):516-527. doi: 10.1002/gepi.22136

Tong Chen, Thomas Lumley (2019) Numerical evaluation of methods approximating the distribution of a large quadratic form in normal variables. Computational Statistics & Data Analysis. 139: 75-81,

Thomas Lumley (2017) "How to add chi-squareds" https://notstatschat.rbind.io/2017/12/06/how-to-add-chi-squareds/

Thomas Lumley (2016) "Large quadratic forms" https://notstatschat.rbind.io/2016/09/27/large-quadratic-forms/

Nathan Halko, Per-Gunnar Martinsson, Joel A. Tropp (2010) "Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions" https://arxiv.org/abs/0909.4061.

Hutchinson, M. F. (1990). A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines. Communications in Statistics - Simulation and Computation, 19(2):433-450.

Examples

data(sequence)
dim(sequence)

skat<-SKAT.matrixfree(sequence)
skat$trace
pQF(c(33782471,7e7,1e8),skat, n=100)


# Don't run these; they take a few minutes
G<-sequence
wuweights<-function(maf) dbeta(maf,1,25)
tmp<-wuweights(colMeans(G)/2)*t(G)
tildeGt<-t(tmp-rowMeans(tmp))/sqrt(2)
sum(tildeGt^2)

pQF(c(33782471,7e7,1e8), tildeGt, n=100)

H<-crossprod(tildeGt)
pQF(c(33782471,7e7,1e8), H, n=100)

[Package bigQF version 1.6 Index]