qmd {qmd}R Documentation

Quantification of Multivariate Dependence

Description

Function for estimating the non-parametric copula-based multivariate measure of dependence \zeta1. This measure quantifies the extent of dependence between a d-dimensional random vector X and a uni-variate random variable y (i.e., it measures the influence of d explanatory variables X1,...,Xd on a univariate variable y). Further details can be found in the section Details and the corresponding references.

Usage

qmd(
  X,
  y,
  ties.correction = FALSE,
  resolution = NULL,
  p.value = FALSE,
  R = 1000,
  print = TRUE,
  na.exclude = FALSE
)

Arguments

X

a numeric matrix or data.frame of dimension d containing the explanatory variables

y

a numeric vector containing the uni-variate response variable

ties.correction

logical indicating if the measure of dependence should be calculated with ties-correction (experimental version). Default = FALSE.

resolution

an integer indicating the resolution N of the checkerboard aggregation. We recommend to use the default configuration (resolution = NULL), which uses the resolution N(n) = floor(n^(1/(d+1))), where d denotes the number of explanatory variables.

p.value

logical indicating if a p-value is returned using permutations of Y

R

integer indicating the number of repetitions for the calculation of the p-value (default = 1000)

print

logical indicating whether the results of the function are printed

na.exclude

logical if all rows containing NAs should be removed.

Details

In the following we will simply write q for the dependence measure \zeta1. Furthermore, X denotes a random vector consisting of d random variables and y denotes a univariate random variable. Then the theoretical dependence measure q fulfills the following essential properties of a dependence measure:

Further properties of q and the exact mathematical definition can be found in Griessenberger et al. (2022). This function qmd() contains the empirical checkerboard-estimator (ECB-estimator), which is strongly consistent and attains always positive values between 0 and 1. Note, that interpretation of low values has to be done with care and always under consideration of the sample size. For instance, values of 0.2 can point towards independence in small sample settings. An additional p-value (testing for independence and being based on permutations of y) helps in order to correctly understand the dependence values. Since independence constitutes the null hypothesis a p-value above the significance level (e.g., 0.05) indicates independence between X and y.

Value

qmd returns a list object containing the following components:

References

Griessenberger, F., Junker, R.R. and Trutschnig, W. (2022). On a multivariate copula-based dependence measure and its estimation, Electronic Journal of Statistics, 16, 2206-2251.

Examples

#(complete dependence for dimension 4)
n <- 300
x1 <- runif(n)
x2 <- runif(n)
x3 <- x1 + x2 + rnorm(n)
y <- x1 + x2 + x3
qmd(X = cbind(x1,x2,x3), y = y, p.value = TRUE)

#(independence for dimension 4)
n <- 500
x1 <- runif(n)
x2 <- runif(n)
x3 <- x1 + x2 + rnorm(n)
y <- runif(n)
qmd(X = cbind(x1,x2,x3), y = y, p.value = TRUE)

#(binary output (classification) for dimension 3)
n <- 500
x1 <- runif(n)
x2 <- runif(n)
y <- ifelse(x1 + x2 < 1, 0, 1)
qmd(X = cbind(x1,x2), y = y, p.value = TRUE)
#(independence)
y <- runif(n)
qmd(X = cbind(x1,x2), y = y, p.value = TRUE)

[Package qmd version 1.1.2 Index]