wschisq {sphunif} | R Documentation |
Weighted sums of non-central chi squared random variables
Description
Approximated density, distribution, and quantile functions for weighted sums of non-central chi squared random variables:
Q_K = \sum_{i = 1}^K w_i \chi^2_{d_i}(\lambda_i),
where w_1, \ldots, w_n
are positive weights, d_1, \ldots, d_n
are positive degrees of freedom, and \lambda_1, \ldots, \lambda_n
are non-negative non-centrality parameters. Also, simulation of Q_K
.
Usage
d_wschisq(x, weights, dfs, ncps = 0, method = c("I", "SW", "HBE")[1],
exact_chisq = TRUE, imhof_epsabs = 1e-06, imhof_epsrel = 1e-06,
imhof_limit = 10000, grad_method = "simple",
grad_method.args = list(eps = 1e-07))
p_wschisq(x, weights, dfs, ncps = 0, method = c("I", "SW", "HBE", "MC")[1],
exact_chisq = TRUE, imhof_epsabs = 1e-06, imhof_epsrel = 1e-06,
imhof_limit = 10000, M = 10000, MC_sample = NULL)
q_wschisq(u, weights, dfs, ncps = 0, method = c("I", "SW", "HBE", "MC")[1],
exact_chisq = TRUE, imhof_epsabs = 1e-06, imhof_epsrel = 1e-06,
imhof_limit = 10000, nlm_gradtol = 1e-06, nlm_iterlim = 1000,
M = 10000, MC_sample = NULL)
r_wschisq(n, weights, dfs, ncps = 0)
cutoff_wschisq(thre = 1e-04, weights, dfs, ncps = 0, log = FALSE,
x_tail = NULL)
Arguments
x |
vector of quantiles. |
weights |
vector with the positive weights of the sum. Must have the
same length as |
dfs |
vector with the positive degrees of freedom of the chi squared
random variables. Must have the same length as |
ncps |
non-centrality parameters. Either |
method |
method for approximating the density, distribution, or
quantile function. Must be |
exact_chisq |
if |
imhof_epsabs , imhof_epsrel , imhof_limit |
precision parameters passed to
|
grad_method , grad_method.args |
numerical differentiation parameters
passed to |
M |
number of Monte Carlo samples for approximating the distribution if
|
MC_sample |
if provided, it is employed when |
u |
vector of probabilities. |
nlm_gradtol , nlm_iterlim |
convergence control parameters passed to
|
n |
sample size. |
thre |
vector with the error thresholds of the tail probability and
mean/variance explained by the first terms of the series. Defaults to
|
log |
are |
x_tail |
scalar evaluation point for determining the upper tail
probability. If |
Details
Four methods are implemented for approximating the distribution of a weighted sum of chi squared random variables:
-
"I"
: Imhof's approximation (Imhof, 1961) for the evaluation of the distribution function. If this method is selected, the function is simply a wrapper toimhof
from theCompQuadForm
package (Duchesne and Lafaye De Micheaux, 2010). -
"SW"
: Satterthwaite–Welch (Satterthwaite, 1946; Welch, 1938) approximation, consisting in matching the first two moments ofQ_K
with a gamma distribution. -
"HBE"
: Hall–Buckley–Eagleson (Hall, 1983; Buckley and Eagleson, 1988) approximation, consisting in matching the first three moments ofQ_K
with a gamma distribution. -
"MC"
: Monte Carlo approximation using the empirical cumulative distribution function withM
simulated samples.
The Imhof method is exact up to the prescribed numerical accuracy. It is also the most time-consuming method. The density and quantile functions for this approximation are obtained by numerical differentiation and inversion, respectively, of the approximated distribution.
For the methods based on gamma matching, the GammaDist
density, distribution, and quantile functions are invoked. The
Hall–Buckley–Eagleson approximation tends to overperform the
Satterthwaite–Welch approximation.
The Monte Carlo method is relatively inaccurate and slow, but serves as an
unbiased reference of the true distribution function. The inversion of the
empirical cumulative distribution is done by quantile
.
An empirical comparison of these and other approximation methods is given in Bodenham and Adams (2016).
cutoff_wschisq
removes NA
s/NaN
s in weights
or
dfs
with a message. The threshold thre
ensures that the
tail probability of the truncated and whole series differ less than
thre
at x_tail
, or that thre
is the proportion of the
mean/variance of the whole series that is not retained. The (upper)
tail probabilities for evaluating truncation are computed using the
Hall–Buckley–Eagleson approximation at x_tail
.
Value
-
d_wschisq
: density function evaluated atx
, a vector. -
p_wschisq
: distribution function evaluated atx
, a vector. -
q_wschisq
: quantile function evaluated atu
, a vector. -
r_wschisq
: a vector of sizen
containing a random sample. -
cutoff_wschisq
: a data frame with the indexes up to which the truncated series explains the tail probability with absolute errorthre
, or the proportion of the mean/variance of the whole series that is not explained by the truncated series.
Author(s)
Eduardo García-Portugués and Paula Navarro-Esteban.
References
Bodenham, D. A. and Adams, N. M. (2016). A comparison of efficient approximations for a weighted sum of chi-squared random variables. Statistics and Computing, 26(4):917–928. doi:10.1007/s11222-015-9583-4
Buckley, M. J. and Eagleson, G. K. (1988). An approximation to the distribution of quadratic forms in normal random variables. Australian Journal of Statistics, 30(1):150–159. doi:10.1111/j.1467-842X.1988.tb00471.x
Duchesne, P. and Lafaye De Micheaux, P. (2010) Computing the distribution of quadratic forms: Further comparisons between the Liu–Tang–Zhang approximation and exact methods. Computational Statistics and Data Analysis, 54(4):858–862. doi:10.1016/j.csda.2009.11.025
Hall, P. (1983). Chi squared approximations to the distribution of a sum of independent random variables. Annals of Probability, 11(4):1028–1036. doi:10.1214/aop/1176993451
Imhof, J. P. (1961). Computing the distribution of quadratic forms in normal variables. Biometrika, 48(3/4):419–426. doi:10.2307/2332763
Satterthwaite, F. E. (1946). An approximate distribution of estimates of variance components. Biometrics Bulletin, 2(6):110–114. doi:10.2307/3002019
Welch, B. L. (1938). The significance of the difference between two means when the population variances are unequal. Biometrika, 29(3/4):350–362. doi:10.2307/2332010
Examples
# Plotting functions for the examples
add_approx_dens <- function(x, dfs, weights, ncps) {
lines(x, d_wschisq(x, weights = weights, dfs = dfs, ncps = ncps,
method = "SW", exact_chisq = FALSE), col = 3)
lines(x, d_wschisq(x, weights = weights, dfs = dfs, ncps = ncps,
method = "HBE", exact_chisq = FALSE), col = 4)
lines(x, d_wschisq(x, weights = weights, dfs = dfs, ncps = ncps,
method = "I", exact_chisq = TRUE), col = 2)
legend("topright", legend = c("True", "SW", "HBE", "I"), lwd = 2,
col = c(1, 3:4, 2))
}
add_approx_distr <- function(x, dfs, weights, ncps, ...) {
lines(x, p_wschisq(x, weights = weights, dfs = dfs, ncps = ncps,
method = "SW", exact_chisq = FALSE), col = 3)
lines(x, p_wschisq(x, weights = weights, dfs = dfs, ncps = ncps,
method = "HBE", exact_chisq = FALSE), col = 4)
lines(x, p_wschisq(x, weights = weights, dfs = dfs, ncps = ncps,
method = "MC", exact_chisq = FALSE), col = 5,
type = "s")
lines(x, p_wschisq(x, weights = weights, dfs = dfs, ncps = ncps,
method = "I", exact_chisq = TRUE), col = 2)
legend("bottomright", legend = c("True", "SW", "HBE", "MC", "I"), lwd = 2,
col = c(1, 3:5, 2))
}
add_approx_quant <- function(u, dfs, weights, ncps, ...) {
lines(u, q_wschisq(u, weights = weights, dfs = dfs, ncps = ncps,
method = "SW", exact_chisq = FALSE), col = 3)
lines(u, q_wschisq(u, weights = weights, dfs = dfs, ncps = ncps,
method = "HBE", exact_chisq = FALSE), col = 4)
lines(u, q_wschisq(u, weights = weights, dfs = dfs, ncps = ncps,
method = "MC", exact_chisq = FALSE), col = 5,
type = "s")
lines(u, q_wschisq(u, weights = weights, dfs = dfs, ncps = ncps,
method = "I", exact_chisq = TRUE), col = 2)
legend("topleft", legend = c("True", "SW", "HBE", "MC", "I"), lwd = 2,
col = c(1, 3:5, 2))
}
# Validation plots for density, distribution, and quantile functions
u <- seq(0.01, 0.99, l = 100)
old_par <- par(mfrow = c(1, 3))
# Case 1: 1 * ChiSq_3(0) + 1 * ChiSq_3(0) = ChiSq_6(0)
weights <- c(1, 1)
dfs <- c(3, 3)
ncps <- 0
x <- seq(-1, 30, l = 100)
main <- expression(1 * chi[3]^2 * (0) + 1 * chi[3]^2 * (0))
plot(x, dchisq(x, df = 6), type = "l", main = main, ylab = "Density")
add_approx_dens(x = x, weights = weights, dfs = dfs, ncps = ncps)
plot(x, pchisq(x, df = 6), type = "l", main = main, ylab = "Distribution")
add_approx_distr(x = x, weights = weights, dfs = dfs, ncps = ncps)
plot(u, qchisq(u, df = 6), type = "l", main = main, ylab = "Quantile")
add_approx_quant(u = u, weights = weights, dfs = dfs, ncps = ncps)
# Case 2: 2 * ChiSq_3(1) + 1 * ChiSq_6(0.5) + 0.5 * ChiSq_12(0.25)
weights <- c(2, 1, 0.5)
dfs <- c(3, 6, 12)
ncps <- c(1, 0.5, 0.25)
x <- seq(0, 70, l = 100)
main <- expression(2 * chi[3]^2 * (1)+ 1 * chi[6]^2 * (0.5) +
0.5 * chi[12]^2 * (0.25))
samp <- r_wschisq(n = 1e4, weights = weights, dfs = dfs, ncps = ncps)
hist(samp, breaks = 50, freq = FALSE, main = main, ylab = "Density",
xlim = range(x), xlab = "x"); box()
add_approx_dens(x = x, weights = weights, dfs = dfs, ncps = ncps)
plot(x, ecdf(samp)(x), main = main, ylab = "Distribution", type = "s")
add_approx_distr(x = x, weights = weights, dfs = dfs, ncps = ncps)
plot(u, quantile(samp, probs = u), type = "s", main = main,
ylab = "Quantile")
add_approx_quant(u = u, weights = weights, dfs = dfs, ncps = ncps)
# Case 3: \sum_{k = 1}^K k^(-3) * ChiSq_{5k}(1 / k^2)
K <- 1e2
weights<- 1 / (1:K)^3
dfs <- 5 * 1:K
ncps <- 1 / (1:K)^2
x <- seq(0, 25, l = 100)
main <- substitute(sum(k^(-3) * chi[5 * k]^2 * (1 / k^2), k == 1, K),
list(K = K))
samp <- r_wschisq(n = 1e4, weights = weights, dfs = dfs, ncps = ncps)
hist(samp, breaks = 50, freq = FALSE, main = main, ylab = "Density",
xlim = range(x), xlab = "x"); box()
add_approx_dens(x = x, weights = weights, dfs = dfs, ncps = ncps)
plot(x, ecdf(samp)(x), main = main, ylab = "Distribution", type = "s")
add_approx_distr(x = x, weights = weights, dfs = dfs, ncps = ncps)
plot(u, quantile(samp, probs = u), type = "s", main = main,
ylab = "Quantile")
add_approx_quant(u = u, weights = weights, dfs = dfs, ncps = ncps)
par(old_par)
# Cutoffs for infinite series of the last example
K <- 1e7
log_weights<- -3 * log(1:K)
log_dfs <- log(5) + log(1:K)
(cutoff <- cutoff_wschisq(thre = 10^(-(1:4)), weights = log_weights,
dfs = log_dfs, log = TRUE))
# Approximation
x <- seq(0, 25, l = 100)
l <- length(cutoff$mean)
main <- expression(sum(k^(-3) * chi[5 * k]^2, k == 1, K))
col <- viridisLite::viridis(l)
plot(x, d_wschisq(x, weights = exp(log_weights[1:cutoff$mean[l]]),
dfs = exp(log_dfs[1:cutoff$mean[l]])), type = "l",
ylab = "Density", col = col[l], lwd = 3)
for(i in rev(seq_along(cutoff$mean)[-l])) {
lines(x, d_wschisq(x, weights = exp(log_weights[1:cutoff$mean[i]]),
dfs = exp(log_dfs[1:cutoff$mean[i]])), col = col[i])
}
legend("topright", legend = paste0(rownames(cutoff), " (", cutoff$mean, ")"),
lwd = 2, col = col)