pQF {bigQF} | R Documentation |
Tail probabilities for quadratic forms
Description
Computes the upper tail probability for quadratic forms in standard Normal variables, using a leading-eigenvalue approximation that is feasible even for large matrices. Can take advantage of sparse matrices.
Usage
pQF(x, M, method = c("ssvd", "lanczos", "satterthwaite"), neig = 100,
tr2.sample.size = 500, q = NULL,
convolution.method = c("saddlepoint", "integration"),
remainder.underflow=c("warn","missing","error"))
Arguments
x |
Vector of quantiles to compute tail probabilities |
M |
If |
method |
Use stochastic SVD ("ssvd") or a thick-restarted Lanczos algorithm ("lanczos") to extract the leading eigenvalues, or use the naive Satterthwaite approximation ("satterthwaite") |
neig |
Number of leading eigenvalues to use |
tr2.sample.size |
When |
q |
Power iteration parameter for the stochastic SVD (see the Halko et al reference) |
convolution.method |
For either the "ssvd" or "lanczos" methods, how the convolution of the leading-eigenvalue terms is performed: by Kuonen's saddlepoint approximation or Davies' algorithm inverting the characteristic function |
remainder.underflow |
How should underflow of the remainder term (see Details) be handled? The default is a warning, with |
Details
Increasing neig
or q
will improve the accuracy of approximation. In simulated human sequence data, neig=100
is satisfactory for 5000 samples and markers. Increasing q
should help when the singular values of M
decrease slowly. The default sample size for the randomized trace estimator seems to be large enough.
If the remainder term in the approximation has less than 1 degree of freedom it will be dropped, and remainder.underflow
controls how this is handled. The approximation will then be anticonservative, but usually not seriously so.
In earlier versions, method="satterthwaite"
rounded the number of
degrees of freedom up to the next integer; it now does not round.
In the current version when M
is of class "matrix-free"
and method="ssvd"
, an improved estimator of the remainder term is
used. Instead of using Hutchinson's randomised trace estimator and
subtracting the known eigenvalues, we apply the randomised trace
estimator after projecting orthogonal to the known eigenvectors. After
more evaluation, the improvement is likely to be extended to the other
methods for rectangular matrices.
By default, Davies's algorithm is run with a tolerance of 1e-9
. This can be changed by setting, eg, options(bigQF.davies.threshold=1e-12)
Value
Vector of upper tail probabilities
Author(s)
Thomas Lumley
References
Lumley et al. (2018) "Sequence kernel association tests for large sets of markers: tail probabilities for large quadratic forms" Genet Epidemiol. 2018 Sep;42(6):516-527. doi: 10.1002/gepi.22136
Tong Chen, Thomas Lumley (2019) Numerical evaluation of methods approximating the distribution of a large quadratic form in normal variables. Computational Statistics & Data Analysis. 139: 75-81,
Thomas Lumley (2017) "How to add chi-squareds" https://notstatschat.rbind.io/2017/12/06/how-to-add-chi-squareds/
Thomas Lumley (2016) "Large quadratic forms" https://notstatschat.rbind.io/2016/09/27/large-quadratic-forms/
Nathan Halko, Per-Gunnar Martinsson, Joel A. Tropp (2010) "Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions" https://arxiv.org/abs/0909.4061.
Hutchinson, M. F. (1990). A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines. Communications in Statistics - Simulation and Computation, 19(2):433-450.
See Also
Examples
data(sequence)
dim(sequence)
skat<-SKAT.matrixfree(sequence)
skat$trace
pQF(c(33782471,7e7,1e8),skat, n=100)
# Don't run these; they take a few minutes
G<-sequence
wuweights<-function(maf) dbeta(maf,1,25)
tmp<-wuweights(colMeans(G)/2)*t(G)
tildeGt<-t(tmp-rowMeans(tmp))/sqrt(2)
sum(tildeGt^2)
pQF(c(33782471,7e7,1e8), tildeGt, n=100)
H<-crossprod(tildeGt)
pQF(c(33782471,7e7,1e8), H, n=100)