R: QQ Plot for Multivariate Normal Variance Mixtures

qqplot_maha {nvmix}

R Documentation

QQ Plot for Multivariate Normal Variance Mixtures

Description

Visual goodness-of-fit test for multivariate normal variance mixtures: Plotting squared Mahalanobis distances against their theoretical quantiles.

Usage

qqplot_maha(x, qmix, loc, scale, fitnvmix_object,
            trafo.to.normal = FALSE, test = c("KS.AD", "KS", "AD", "none"),
            boot.pars = list(B = 500, level = 0.95),
            plot = TRUE, verbose = TRUE, control = list(),
            digits = max(3, getOption("digits") - 4), plot.pars = list(), ...)

Arguments

`x`	`(n, d)`-data `matrix`.
`qmix`	see `pnvmix()`.
`loc`	see `pnvmix()`.
`scale`	see `pnvmix()`.
`fitnvmix_object`	Optional. Object of class `"fitnvmix"` typically returned by `fitnvmix()`; if provided, `x`, `qmix`, `loc` and `scale` are ignored.
`trafo.to.normal`	`logical`. If `TRUE`, the underlying Mahalanobis distances are mapped to normals by a probability- quantile-transform so that the resulting QQ plot is essentially a normal QQ plot. Defaults to `FALSE`.
`test`	`character` specifying if (and which) GoF test shall be performed. `"KS"` performs a Kolmogorov-Smirnoff (see `ks.test()`), `"AD"` an Anderson-Darling test (see `ad.test()` from the package `ADGofTest` and `"none"` performs no test. By default, `test = "KS.AD"` in which case both tests are performed.
`boot.pars`	`list` with elements `B` (Bootstrap sample size for computing CIs; if `B <= 1`, no Bootstrap is performed) and `level` specifying the confidence level.
`plot`	`logical` specifying if the results should be plotted.
`verbose`	see `pnvmix()`.
`control`	see `get_set_param()`.
`digits`	integer specifying the number of digits of the test statistic and the p-value to be displayed.
`plot.pars`	`list` specifying plotting parameters such as logarithmic axes; see `get_set_qqplot_param()`.
`...`	additional arguments (for example, parameters) passed to the underlying mixing distribution when `qmix` is a `character` string or `function`.

Details

If X follows a multivariate normal variance mixture, the distribution of the Mahalanobis distance D^2 = (X-\mu)^T \Sigma^{-1} (X-\mu) is a gamma mixture whose distribution function can be approximated.

The function qqplot_maha() first estimates the theoretical quantiles by calling qgammamix() and then plots those against the empirical squared Mahalanobis distances from the data in x (with \mu=loc and \Sigma=scale). Furthermore, the function computes asymptotic standard errors of the sample quantiles by using an asymptotic normality result for the order statistics which are used to plot the asymptotic CI; see Fox (2008, p. 35 – 36).+

If boot.pars$B > 1 (which is the default), the function additionally performs Bootstrap to construct a CI. Note that by default, the plot contains both the asymptotic and the Bootstrap CI.

Finally, depending on the parameter test, the function performs a univariate GoF test of the observed Mahalanobis distances as described above. The test result (i.e., the value of the statistic along with a p-value) are typically plotted on the second y-axis.

The return object of class "qqplot_maha" contains all computed values (such as p-value, test-statistics, Bootstrap CIs and more). We highlight that storing this return object makes the QQ plot quickly reproducible, as in this case, the theoretical quantiles do not need to be recomputed.

For changing plotting parameters (such as logarithmic axes or colors) via the argument plot.pars, see get_set_qqplot_param().

Value

qqplot_maha() (invisibly) returns an object of the class "qqplot_maha" for which the methods plot() and print() are defined. The return object contains, among others, the components

maha2: Sorted, squared Mahalanobis distances of the data from loc wrt to scale.
theo_quant: The theoretical quantile function evaluated at ppoints(length(maha2)).
boot_CI: (2,length(maha2)) matrix containing the Bootstrap CIs for the empirical quantiles.
asymptSE: vector of length length(maha2) with estimated, asympotic standard errors for the empirical quantiles.

Author(s)

Erik Hintz, Marius Hofert and Christiane Lemieux

References

Hintz, E., Hofert, M. and Lemieux, C. (2021), Normal variance mixtures: Distribution, density and parameter estimation. Computational Statistics and Data Analysis 157C, 107175.

Hintz, E., Hofert, M. and Lemieux, C. (2022), Multivariate Normal Variance Mixtures in R: The R Package nvmix. Journal of Statistical Software, doi:10.18637/jss.v102.i02.

McNeil, A. J., Frey, R. and Embrechts, P. (2015). Quantitative Risk Management: Concepts, Techniques, Tools. Princeton University Press.

Examples

## Sample from a heavy tailed multivariate t and construct QQ plot
set.seed(1)
d <- 2
n <- 1000
df <- 3.1
rho <- 0.5
loc <- rep(0, d)
scale <- matrix(c(1, rho, rho, 1), ncol = 2)
qmix <- "inverse.gamma"
## Sample data
x <- rnvmix(n, qmix = qmix, loc = loc, scale = scale, df = df)
# Construct QQ Plot with 'true' parameters and store result object
qq1 <- qqplot_maha(x, qmix = qmix, df = df, loc = loc, scale = scale)
## ... which is an object of class "qqplot_maha" with two methods
stopifnot(class(qq1) == "qqplot_maha", "plot.qqplot_maha" %in% methods(plot),
          "print.qqplot_maha" %in% methods(print))
plot(qq1) # reproduce the plot
plot(qq1, plotpars = list(log = "xy")) # we can also plot on log-log scale

## In fact, with the 'plotpars' argument, we can change a lot of things
plot(qq1, plotpars = list(col = rep("black", 4), lty = 4:6, pch = "*",
                          plot_test = FALSE, main = "Same with smaller y limits",
                          sub = "MySub", xlab = "MyXlab", ylim = c(0, 1.5e3)))

## What about estimated parameters?
myfit <- fitStudent(x)
## We can conveniently pass 'myfit', rather than specifying 'x', 'loc', ...
set.seed(1)
qq2.1 <- qqplot_maha(fitnvmix_object = myfit, test = "AD", trafo_to_normal = TRUE)
set.seed(1)
qq2.2 <- qqplot_maha(x, qmix = "inverse.gamma", loc = myfit$loc,
                     scale = myfit$scale, df = myfit$df,
                     test = "AD", trafo_to_normal = TRUE)
stopifnot(all.equal(qq2.1$boot_CI, qq2.2$boot_CI)) # check
qq2.2 # it mentions here that the Maha distances were transformed to normal


## Another example where 'qmix' is a function, so quantiles are internally
## estimated via 'qgammamix()'
n <- 15 # small sample size to have examples run fast
## Define the quantile function of an IG(nu/2, nu/2) distribution
qmix <- function(u, df) 1 / qgamma(1 - u, shape = df/2, rate = df/2)
## Sample data
x <- rnvmix(n, qmix = qmix, df = df, loc = loc, scale = scale)
## QQ Plot of empirical quantiles vs true quantiles, all values estimated
## via RQMC:
set.seed(1)
qq3.1 <- qqplot_maha(x, qmix = qmix, loc = loc, scale = scale, df = df)
## Same could be obtained by specifying 'qmix' as string in which case
## qqplot_maha() calls qf()
set.seed(1)
qq3.2 <- qqplot_maha(x, qmix = "inverse.gamma", loc = loc, scale = scale, df = df)

[Package nvmix version 0.1-1 Index]