R: Probability distribution of ratio of quadratic forms

dqfr {qfratio}

R Documentation

Probability distribution of ratio of quadratic forms

Description

dqfr(): Density of the (power of) ratio of quadratic forms, \left( \frac{ \mathbf{x^{\mathit{T}} A x} }{ \mathbf{x^{\mathit{T}} B x} } \right) ^ p, where \mathbf{x} \sim N_n(\bm{\mu}, \mathbf{\Sigma}).

pqfr(): Distribution function of the same.

qqfr(): Quantile function of the same.

dqfr_A1I1(): internal for dqfr(), exact series expression of Hillier (2001). Only accommodates the simple case where \mathbf{B} = \mathbf{I}_n and \bm{\mu} = \mathbf{0}_n.

dqfr_broda(): internal for dqfr(), exact numerical inversion algorithm of Broda & Paolella (2009).

dqfr_butler(): internal for dqfr(), saddlepoint approximation of Butler & Paolella (2007, 2008).

pqfr_A1B1(): internal for pqfr(), exact series expression of Forchini (2002, 2005).

pqfr_imhof(): internal for pqfr(), exact numerical inversion algorithm of Imhof (1961).

pqfr_davies(): internal for pqfr(), exact numerical inversion algorithm of Davies (1973, 1980). This is experimental and may be removed in the future.

pqfr_butler(): internal for pqfr(), saddlepoint approximation of Butler & Paolella (2007, 2008).

The user is supposed to use the exported functions dqfr(), pqfr(), and qqfr(), which are (pseudo-)vectorized with respect to quantile or probability. The actual calculations are done by one of the internal functions, which only accommodate a length-one quantile. The internal functions skip most checks on argument structures and do not accommodate Sigma to reduce execution time.

Usage

dqfr(
  quantile,
  A,
  B,
  p = 1,
  mu = rep.int(0, n),
  Sigma = diag(n),
  log = FALSE,
  method = c("broda", "hillier", "butler"),
  trim_values = TRUE,
  normalize_spa = FALSE,
  return_abserr_attr = FALSE,
  m = 100L,
  tol_zero = .Machine$double.eps * 100,
  tol_sing = tol_zero,
  ...
)

pqfr(
  quantile,
  A,
  B,
  p = 1,
  mu = rep.int(0, n),
  Sigma = diag(n),
  lower.tail = TRUE,
  log.p = FALSE,
  method = c("imhof", "davies", "forchini", "butler"),
  trim_values = TRUE,
  return_abserr_attr = FALSE,
  m = 100L,
  tol_zero = .Machine$double.eps * 100,
  tol_sing = tol_zero,
  ...
)

qqfr(
  probability,
  A,
  B,
  p = 1,
  mu = rep.int(0, n),
  Sigma = diag(n),
  lower.tail = TRUE,
  log.p = FALSE,
  trim_values = FALSE,
  return_abserr_attr = FALSE,
  stop_on_error = FALSE,
  m = 100L,
  tol_zero = .Machine$double.eps * 100,
  tol_sing = tol_zero,
  epsabs_q = .Machine$double.eps^(1/2),
  maxiter_q = 5000,
  ...
)

dqfr_A1I1(
  quantile,
  LA,
  m = 100L,
  check_convergence = c("relative", "strict_relative", "absolute", "none"),
  use_cpp = TRUE,
  tol_conv = .Machine$double.eps^(1/4),
  thr_margin = 100
)

dqfr_broda(
  quantile,
  A,
  B,
  mu = rep.int(0, n),
  autoscale_args = 1,
  stop_on_error = TRUE,
  use_cpp = TRUE,
  tol_zero = .Machine$double.eps * 100,
  epsabs = epsrel,
  epsrel = 1e-06,
  limit = 10000
)

dqfr_butler(
  quantile,
  A,
  B,
  mu = rep.int(0, n),
  order_spa = 2,
  stop_on_error = FALSE,
  use_cpp = TRUE,
  tol_zero = .Machine$double.eps * 100,
  epsabs = .Machine$double.eps^(1/2),
  epsrel = 0,
  maxiter = 5000
)

pqfr_A1B1(
  quantile,
  A,
  B,
  m = 100L,
  mu = rep.int(0, n),
  check_convergence = c("relative", "strict_relative", "absolute", "none"),
  stop_on_error = FALSE,
  use_cpp = TRUE,
  cpp_method = c("double", "long_double", "coef_wise"),
  nthreads = 1,
  tol_conv = .Machine$double.eps^(1/4),
  tol_zero = .Machine$double.eps * 100,
  thr_margin = 100
)

pqfr_imhof(
  quantile,
  A,
  B,
  mu = rep.int(0, n),
  autoscale_args = 1,
  stop_on_error = TRUE,
  use_cpp = TRUE,
  tol_zero = .Machine$double.eps * 100,
  epsabs = epsrel,
  epsrel = 1e-06,
  limit = 10000
)

pqfr_davies(
  quantile,
  A,
  B,
  mu = rep.int(0, n),
  autoscale_args = 1,
  stop_on_error = NULL,
  tol_zero = .Machine$double.eps * 100,
  ...
)

pqfr_butler(
  quantile,
  A,
  B,
  mu = rep.int(0, n),
  order_spa = 2,
  stop_on_error = FALSE,
  use_cpp = TRUE,
  tol_zero = .Machine$double.eps * 100,
  epsabs = .Machine$double.eps^(1/2),
  epsrel = 0,
  maxiter = 5000
)

Arguments

`quantile`	Numeric vector of quantiles `q`
`A`, `B`	Argument matrices. Should be square. `B` should be nonnegative definite. Will be automatically symmetrized in `dqfr()` and `pqfr()`.
`p`	Positive exponent of the ratio, default `1`. Unlike in `qfrm()`, the numerator and denominator cannot have different exponents. When `p` is non-integer, `A` must be nonnegative definite. For details, see vignette `vignette("qfratio_distr")`.
`mu`	Mean vector `\bm{\mu}` for `\mathbf{x}`
`Sigma`	Covariance matrix `\mathbf{\Sigma}` for `\mathbf{x}`
`log`, `lower.tail`, `log.p`	Logical; as in regular probability distribution functions. But these are for convenience only, and not meant for accuracy.
`method`	Method to specify an internal function (see “Details”). In `dqfr()`, options are: `"broda"` default; uses `dqfr_broda()`, numerical inversion of Broda & Paolella (2009) `"hillier"` uses `dqfr_A1I1()`, series expression of Hillier (2001) `"butler"` uses `dqfr_butler()`, saddlepoint approximation of Butler & Paolella (2007, 2008) In `pqfr()`, options are: `"imhof"` default; uses `pqfr_imhof()`, numerical inversion of Imhof (1961) `"davies"` uses `pqfr_davies()`, numerical inversion of Davies (1973, 1980) `"forchini"` uses `pqfr_A1B1()`, series expression of Forchini (2002, 2005) `"butler"` uses `pqfr_butler()`, saddlepoint approximation of Butler & Paolella (2007, 2008)
`trim_values`	If `TRUE` (default), numerical values outside the mathematically permissible support are trimmed in (see “Details”)
`normalize_spa`	If `TRUE` and `method == "butler"`, result is normalized so that the density integrates to unity (see “Details”)
`return_abserr_attr`	If `TRUE`, absolute error of numerical evaluation is returned as an attribute `"abserr"` (see “Value”)
`m`	Order of polynomials at which the series expression is truncated. `M` in Hillier et al. (2009, 2014).
`tol_zero`	Tolerance against which numerical zero is determined. Used to determine, e.g., whether `mu` is a zero vector, `A` or `B` equals the identity matrix, etc.
`tol_sing`	Tolerance against which matrix singularity and rank are determined. The eigenvalues smaller than this are considered zero.
`...`	Additional arguments passed to internal functions. In `qqfr()`, these are passed to `pqfr()`.
`probability`	Numeric vector of probabilities
`stop_on_error`	If `TRUE`, execution is stopped upon an error (including non-convergence) in evaluation of hypergeometric function, numerical integration, or root finding. If `FALSE`, further execution is attempted regardless.
`LA`	Eigenvalues of `\mathbf{A}`
`check_convergence`	Specifies how numerical convergence is checked for series expression (see `qfrm`)
`use_cpp`	Logical to specify whether the calculation is done with C++ functions via `Rcpp`. `TRUE` by default.
`tol_conv`	Tolerance against which numerical convergence of series is checked. Used with `check_convergence`.
`thr_margin`	Optional argument to adjust the threshold for scaling (see “Scaling” in `d1_i`). Passed to internal functions (`d1_i`, `d2_ij`, `d3_ijk`) or their C++ equivalents.
`autoscale_args`	Numeric; if `> 0` (default), arguments are scaled to avoid failure in numerical integration (see `vignette("qfratio_distr")`). If `<= 0`, the scaling is skipped.
`epsabs`, `epsrel`, `limit`, `maxiter`, `epsabs_q`, `maxiter_q`	Optional arguments used in numerical integration or root-finding algorithm (see vignette: `vignette("qfratio_distr")`). In `qqfr()`, `epsabs_q` and `maxiter_q` are used in root-finding for quantiles whereas `epsabs` and `maxiter` are passed to `pqfr()` internally.
`order_spa`	Numeric to determine order of saddlepoint approximation. More accurate second-order approximation is used for any `order > 1` (default); otherwise, (very slightly) faster first-order approximation is used.
`cpp_method`	Method used in C++ calculations to avoid numerical overflow/underflow (see “Details” in `qfrm`)
`nthreads`	Number of threads used in OpenMP-enabled C++ functions (see “Multithreading” in `qfrm`)

Details

qqfr() is based on numerical root-finding with pqfr() using uniroot(), so its result can be affected by the numerical errors in both the algorithm used in pqfr() and root-finding.

dqfr_A1I1() and pqfr_A1B1() evaluate the probability density and (cumulative) distribution function, respectively, as a partial sum of infinite series involving top-order zonal or invariant polynomials (Hillier 2001; Forchini 2002, 2005). As in other functions of this package, these are evaluated with the recursive algorithm d1_i.

pqfr_imhof() and pqfr_davies() evaluate the distribution function by numerical inversion of the characteristic function based on Imhof (1961) or Davies (1973, 1980), respectively. The latter calls davies(), and the former with use_cpp = FALSE calls imhof(), from the package CompQuadForm. Additional arguments for davies() can be passed via ..., except for sigma, which is not applicable.

dqfr_broda() evaluates the probability density by numerical inversion of the characteristic function using Geary's formula based on Broda & Paolella (2009). Parameters for numerical integration can be controlled via the arguments epsabs, epsrel, and limit (see vignette: vignette("qfratio_distr")).

dqfr_butler() and pqfr_butler() evaluate saddlepoint approximations of the density and distribution function, respectively, based on Butler & Paolella (2007, 2008). These are fast but not exact. They conduct numerical root-finding for the saddlepoint by the Brent method, parameters for which can be controlled by the arguments epsabs, epsrel, and maxiter (see vignette: vignette("qfratio_distr")). The saddlepoint approximation density does not integrate to unity, but can be normalized by dqfr(..., method = "butler", normalize_spa = TRUE). Note that this is usually slower than dqfr(..., method = "broda") for a small number of quantiles.

The density is undefined, and the distribution function has points of nonanalyticity, at the eigenvalues of \mathbf{B}^{-1} \mathbf{A} (assuming nonsingular \mathbf{B}). Around these points, the series expressions tends to fail. Avoid using the series expression methods for these cases.

Algorithms based on numerical integration can yield spurious results that are outside the mathematically permissible support; e.g., [0, 1] for pqfr(). By default, dqfr() and pqfr() trim those values into the permissible range with a warning; e.g., negative p-values are replaced by ~2.2e-14 (default tol_zero). Turn trim_values = FALSE to skip these trimming and warning, although pqfr_imhof() and pqfr_davies() can still throw a warning from CompQuadForm functions. Note that, on the other hand, all these functions try to return exact 0 or 1 when q is outside the possible range of the statistic.

Value

dqfr() and pqfr() give the density and distribution (or p-values) functions, respectively, corresponding to quantile, whereas qqfr() gives the quantile function corresponding to probability.

When return_abserr_attr = TRUE, an absolute error bound of numerical evaluation is returned as an attribute; this feature is currently available with dqfr(..., method = "broda"), pqfr(..., method = "imhof"), and qqfr(..., method = "imhof") (all default) only. This error bound is automatically transformed when trimming happens with trim_values (above) or when log/log.p = TRUE. See vignette for details (vignette("qfratio_distr")).

The internal functions return a list containing $d or $p (for density and lower p-value, respectively), and only this is passed to the external function by default. Other components may be inspected for debugging purposes:

dqfr_A1I1() and pqfr_A1B1(): have $terms, a vector of 0th to mth order terms.
pqfr_imhof() and dqfr_broda(): have $abserr, absolute error of numerical integration; the one returned from CompQuadForm::imhof() is divided by pi, as the integration result itself is (internally). This is passed to the external functions when return_abserr_attr = TRUE (above).
pqfr_davies(): has the same components as CompQuadForm::davies() apart from Qq which is replaced by p = 1 - Qq.

References

Broda, S. and Paolella, M. S. (2009) Evaluating the density of ratios of noncentral quadratic forms in normal variables. Computational Statistics and Data Analysis, 53, 1264–1270. doi:10.1016/j.csda.2008.10.035

Butler, R. W. and Paolella, M. S. (2007) Uniform saddlepoint approximations for ratios of quadratic forms. Technical Reports, Department of Statistical Science, Southern Methodist University, no. 351. [Available on arXiv as a preprint.] doi:10.48550/arXiv.0803.2132

Butler, R. W. and Paolella, M. S. (2008) Uniform saddlepoint approximations for ratios of quadratic forms. Bernoulli, 14, 140–154. doi:10.3150/07-BEJ6169

Davis, R. B. (1973) Numerical inversion of a characteristic function. Biometrika, 60, 415–417. doi:10.1093/biomet/60.2.415.

Davis, R. B. (1980) Algorithm AS 155: The distribution of a linear combination of \chi^2 random variables. Journal of the Royal Statistical Society, Series C—Applied Statistics, 29, 323–333. doi:10.2307/2346911.

Forchini, G. (2002) The exact cumulative distribution function of a ratio of quadratic forms in normal variables, with application to the AR(1) model. Econometric Theory, 18, 823–852. doi:10.1017/S0266466602184015.

Forchini, G. (2005) The distribution of a ratio of quadratic forms in noncentral normal variables. Communications in Statistics—Theory and Methods, 34, 999–1008. doi:10.1081/STA-200056855.

Hillier, G. (2001) The density of quadratic form in a vector uniformly distributed on the n-sphere. Econometric Theory, 17, 1–28. doi:10.1017/S026646660117101X.

Imhof, J. P. (1961) Computing the distribution of quadratic forms in normal variables. Biometrika, 48, 419–426. doi:10.1093/biomet/48.3-4.419.

Examples

## Some symmetric matrices and parameters
nv <- 4
A <- diag(nv:1)
B <- diag(sqrt(1:nv))
mu <- 0.2 * nv:1
Sigma <- matrix(0.5, nv, nv)
diag(Sigma) <- 1

## density and p-value for (x^T A x) / (x^T x) where x ~ N(0, I)
dqfr(1.5, A)
pqfr(1.5, A)

## 95 percentile for the same
qqfr(0.95, A)
qqfr(0.05, A, lower.tail = FALSE) # same

## P{ (x^T A x) / (x^T B x) <= 1.5} where x ~ N(mu, Sigma)
pqfr(1.5, A, B, mu = mu, Sigma = Sigma)

## These are (pseudo-)vectorized
qs <- 0:nv + 0.5
dqfr(qs, A, B, mu = mu)
(pres <- pqfr(qs, A, B, mu = mu))

## Quantiles for above p-values
## Results equal qs, except that those for prob = 0 and 1
## are replaced by mininum and maximum of the ratio
qqfr(pres, A, B, mu = mu) # = qs

## Various methods for density
dqfr(qs, A, method = "broda")   # default
dqfr(qs, A, method = "hillier") # series; B, mu, Sigma not permitted
## Saddlepoint approximations (fast but inexact):
dqfr(qs, A, method = "butler")  # 2nd order by default
dqfr(qs, A, method = "butler", normalize_spa = TRUE) # normalized
dqfr(qs, A, method = "butler", normalize_spa = TRUE,
     order_spa = 1) # 1st order, normalized

## Various methods for distribution function
pqfr(qs, A, method = "imhof")    # default
pqfr(qs, A, method = "davies")   # very similar
pqfr(qs, A, method = "forchini") # series expression
pqfr(qs, A, method = "butler")   # saddlepoint approximation (2nd order)
pqfr(qs, A, method = "butler", order_spa = 1) # 1st order

## To see error bounds
dqfr(qs, A, return_abserr_attr = TRUE)
pqfr(qs, A, return_abserr_attr = TRUE)
qqfr(pres, A, return_abserr_attr = TRUE)

[Package qfratio version 1.1.1 Index]